Skip to left side bar
>
  • File
  • Edit
  • View
  • Run
  • Kernel
  • Tabs
  • Settings
  • Help

Open Tabs

  • DRandomForest.ipynb
  • CreateDPred.ipynb
  • DAnalysisPred.ipynb
  • ORandomForest.ipynb
  • CreateOPred.ipynb
  • OAnalysisPred.ipynb

Kernels

  • Basketball100.ipynb
  • DRandomForest.ipynb
  • OAnalysisPred.ipynb
  • ORandomForest.ipynb
  • CreateOPred.ipynb
  • CreateDPred.ipynb
  • DAnalysisPred.ipynb

Terminals

    Warning
    The JupyterLab development team is excited to have a robust third-party extension community. However, we do not review third-party extensions, and some extensions may introduce security risks or contain malicious code that runs on your machine.
    Installed
    Discover
    /NFL-Game-Prediction-Win/
    Name
    ...
    Last Modified
    • Nfl Point Prediction4 minutes ago
    • Untitled Folder 2a month ago
    • Untitled Folder 1a month ago
    • Untitled Foldera month ago
    • untitled.txt5 minutes ago
    • tree.png7 days ago
    • tree.dot7 days ago
    • Basketball100.ipynb8 days ago
    • RushPG.ipynb9 days ago
    • PlayerPassing.ipynb9 days ago
    • Untitled14.ipynb9 days ago
    • Untitled1.ipynb16 days ago
    • Untitled13.ipynb17 days ago
    • Untitled12.ipynb21 days ago
    • Untitled10.ipynb21 days ago
    • Untitled11.ipynb21 days ago
    • Untitled6.ipynb21 days ago
    • Untitled8.ipynb21 days ago
    • Untitled9.ipynb21 days ago
    • PassPG.ipynb21 days ago
    • Untitled.ipynb24 days ago
    • OverallRank.ipynb24 days ago
    • PassORank.ipynb24 days ago
    • PassDRank.ipynba month ago
    • RushDRank.ipynba month ago
    • RushORank.ipynba month ago
    • Untitled5.ipynba month ago
    • Untitled4.ipynba month ago
    • Untitled7.ipynba month ago
    • Untitled3.ipynba month ago
    • Untitled2.ipynba month ago
    • activestate.yamla month ago
    DAnalysisPred.ipynb
    • Random Forest Regression of the defence of NFL teams through the 2018-21 seasons
    • Scatter Graph
    • Defining Variables
    • Linear Regression
    • Projection and Prediction Comparison
    • New DataFrame
    • Bar Chart
    • DRandomForest.ipynb
    • CreateDPred.ipynb
    • DAnalysisPred.ipynb
    • ORandomForest.ipynb
    • CreateOPred.ipynb
    • OAnalysisPred.ipynb
    xxxxxxxxxx
     
    # Random Forest Regression of the defence of NFL teams through the 2018-21 seasons 
    ​
    ​
    All modules and libraries imported. csv containing raw data from ProFootballReference.com is also imported and read by the pandas read function. This is defined
    as df.

    Random Forest Regression of the defence of NFL teams through the 2018-21 seasons¶

    All modules and libraries imported. csv containing raw data from ProFootballReference.com is also imported and read by the pandas read function. This is defined as df.

    [46]:
    xxxxxxxxxx
     
    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    from sklearn import linear_model
    from sklearn.linear_model import LinearRegression
    from sklearn.metrics import mean_squared_error, r2_score
    import statsmodels.api as sm
    from statsmodels.formula.api import ols
    from statsmodels.stats.anova import anova_lm
    from statsmodels.graphics.factorplots import interaction_plot
    from scipy import stats
    import matplotlib.colors as mcolors
    from scipy.stats import rankdata
    import seaborn as sns
    df = pd.read_csv (r'C:\Users\Rob\Documents\dstats.csv')
    print (df)
                            Tm   G  Cmp  Att  Cmp%   Yds  TD  TD%  Int  PD  ...  \
    0        Arizona Cardinals  17  367  561  65.4  3645  30  5.3   13  73  ...   
    1          Atlanta Falcons  17  391  577  67.8  3952  31  5.4   12  77  ...   
    2         Baltimore Ravens  17  397  621  63.9  4742  31  5.0    9  72  ...   
    3            Buffalo Bills  17  297  530  56.0  2771  12  2.3   19  80  ...   
    4        Carolina Panthers  17  337  515  65.4  3266  26  5.0    9  52  ...   
    ..                     ...  ..  ...  ...   ...   ...  ..  ...  ...  ..  ...   
    91   San Francisco 49ers19  16  318  519  61.3  2707  23  4.4   12  75  ...   
    92      Seattle Seahawks19  16  383  598  64.0  4223  19  3.2   16  74  ...   
    93  Tampa Bay Buccaneers19  16  408  664  61.4  4322  30  4.5   12  96  ...   
    94      Tennessee Titans19  16  386  598  64.5  4080  25  4.2   14  72  ...   
    95   Washington Redskins19  16  371  540  68.7  3823  35  6.5   13  52  ...   
    
        Hrry   Hrry%  QBKD   QBKD%  aSk  Prss   Prss%  MTkl   PA    PAA  
    0     61   9.80%    60  10.70%   41   162  25.90%   110  366  384.1  
    1     48   7.60%    39   6.80%   18   105  16.70%   120  459  384.1  
    2     58   8.60%    62  10.00%   34   154  23.00%   115  392  384.1  
    3     93  15.40%    51   9.60%   42   186  30.80%   118  289  384.1  
    4     62  10.90%    48   9.30%   39   149  26.10%   106  404  384.1  
    ..   ...     ...   ...     ...  ...   ...     ...   ...  ...    ...  
    91    88  14.70%    36   6.90%   48   172  28.70%   107  310  384.1  
    92    60   9.20%    38   6.40%   28   126  19.30%   131  398  384.1  
    93    62   8.50%    66   9.90%   47   175  23.90%   118  449  384.1  
    94    72  10.70%    27   4.50%   43   142  21.10%   110  331  384.1  
    95    83  13.60%    45   8.30%   46   174  28.50%   116  435  384.1  
    
    [96 rows x 49 columns]
    
    xxxxxxxxxx
     
    Finding the mean of all points allowed and added back to the df to use as the baseline later

    Finding the mean of all points allowed and added back to the df to use as the baseline later

    [47]:
     
    np.mean(df['PA'])
    ​
    [47]:
    384.1041666666667
    xxxxxxxxxx
     
    Checking that all the cells in the dataframe is filled

    Checking that all the cells in the dataframe is filled

    [48]:
     
    df.describe()
    [48]:
    G Cmp Att Cmp% Yds TD TD% Int PD Int% ... Air aYAC Bltz Hrry QBKD aSk Prss MTkl PA PAA
    count 96.000000 96.000000 96.000000 96.000000 96.000000 96.00000 96.000000 96.000000 96.000000 96.000000 ... 96.000000 96.000000 96.000000 96.000000 96.000000 96.000000 96.000000 96.000000 96.000000 9.600000e+01
    mean 16.333333 366.750000 568.572917 64.473958 3827.718750 26.12500 4.606250 12.968750 68.968750 2.288542 ... 2294.708333 1933.708333 178.468750 62.020833 49.697917 38.072917 149.791667 107.989583 384.104167 3.841000e+02
    std 0.473879 33.964688 41.337060 3.149812 428.987614 5.55783 0.991417 4.027414 11.484042 0.720507 ... 307.854850 276.118202 50.766328 13.440692 11.606822 9.128607 23.402167 16.409557 59.334463 7.428436e-13
    min 16.000000 297.000000 464.000000 56.000000 2707.000000 12.00000 2.300000 3.000000 43.000000 0.600000 ... 1707.000000 1354.000000 72.000000 35.000000 27.000000 17.000000 95.000000 66.000000 225.000000 3.841000e+02
    25% 16.000000 343.750000 541.000000 62.675000 3577.500000 22.00000 3.900000 10.000000 60.750000 1.800000 ... 2108.000000 1734.250000 143.750000 52.000000 41.000000 31.750000 135.500000 96.750000 351.000000 3.841000e+02
    50% 16.000000 368.000000 562.000000 64.050000 3822.000000 26.00000 4.600000 12.500000 70.000000 2.250000 ... 2258.000000 1906.000000 168.500000 59.500000 50.000000 39.000000 148.000000 109.000000 373.000000 3.841000e+02
    75% 17.000000 390.250000 598.000000 66.650000 4123.000000 30.00000 5.225000 15.000000 77.250000 2.625000 ... 2487.250000 2071.500000 206.750000 71.000000 56.000000 46.000000 165.250000 119.000000 426.000000 3.841000e+02
    max 17.000000 450.000000 680.000000 70.700000 4742.000000 39.00000 7.200000 26.000000 96.000000 4.700000 ... 3118.000000 2793.000000 329.000000 95.000000 80.000000 56.000000 219.000000 143.000000 519.000000 3.841000e+02

    8 rows × 44 columns

    xxxxxxxxxx
     
    One-hot Encoding to turn the Categorical data into numbers to allow better analysis through the machine learning

    One-hot Encoding to turn the Categorical data into numbers to allow better analysis through the machine learning

    [49]:
    x
     
    df = pd.get_dummies(df)
    df.iloc[:,5:].head(5)
    [49]:
    TD TD% Int PD Int% Y/A AY/A Y/C Y/G Rate ... Prss%_26.80% Prss%_27.50% Prss%_27.60% Prss%_27.90% Prss%_28.50% Prss%_28.60% Prss%_28.70% Prss%_30.50% Prss%_30.80% Prss%_35.10%
    0 30 5.3 13 73 2.3 6.9 6.9 10.6 214.4 93.5 ... 0 0 0 0 0 0 0 0 0 0
    1 31 5.4 12 77 2.1 7.1 7.3 10.5 232.5 97.4 ... 0 0 0 0 0 0 0 0 0 0
    2 31 5.0 9 72 1.4 8.0 8.4 12.6 278.9 99.4 ... 0 0 0 0 0 0 0 0 0 0
    3 12 2.3 19 80 3.6 5.7 4.6 10.2 163.0 65.3 ... 0 0 0 0 0 0 0 0 1 0
    4 26 5.0 9 52 1.7 6.9 7.1 10.6 192.1 95.0 ... 0 0 0 0 0 0 0 0 0 0

    5 rows × 396 columns

    xxxxxxxxxx
     
    ### Choose what is to be predicted
    ​
    Define the desired variable to model for (predict) as labels. Then remove that column from the dataframe and return the dataframe as an array.

    Choose what is to be predicted¶

    Define the desired variable to model for (predict) as labels. Then remove that column from the dataframe and return the dataframe as an array.

    [50]:
    x
     
    labels = np.array(df['PA'])
    df= df.drop('PA', axis = 1)
    df_list = list(df.columns)
    df = np.array(df)
    x
    Import the train_test_split function from sklearn

    Import the train_test_split function from sklearn

    [51]:
    xxxxxxxxxx
     
    from sklearn.model_selection import train_test_split
    xxxxxxxxxx
     
    Create the training and testing splits. Also add the size of the testing data.

    Create the training and testing splits. Also add the size of the testing data.

    [52]:
    x
     
    train_df, test_df, train_labels, test_labels = train_test_split(df, labels, test_size = 0.25, random_state = 42)
    x
    Print the shapes of the splits created above to make sure there are no errors in them
    ​

    Print the shapes of the splits created above to make sure there are no errors in them

    [53]:
    xxxxxxxxxx
     
    print('Training df Shape:', train_df.shape)
    print('Training Labels Shape:', train_labels.shape)
    print('Testing df Shape:', test_df.shape)
    print('Testing Labels Shape:', test_labels.shape)
    Training df Shape: (72, 400)
    Training Labels Shape: (72,)
    Testing df Shape: (24, 400)
    Testing Labels Shape: (24,)
    
    xxxxxxxxxx
     
    ### Baseline
    ​
    Use the PAA (Points Against Average) created earlier as the baseline prediction. Test for errors in the baseline compared to the testing data

    Baseline¶

    Use the PAA (Points Against Average) created earlier as the baseline prediction. Test for errors in the baseline compared to the testing data

    [54]:
    x
     
    baseline_preds = test_df[:, df_list.index('PAA')]
    baseline_errors = abs(baseline_preds - test_labels)
    print('Average baseline error: ', round(np.mean(baseline_errors), 2))
    Average baseline error:  53.93
    
    xxxxxxxxxx
     
    ### Random Forest Regression
    ​
    Import the RandomForestRegression from Sklearn. Then call for the number of decision trees required, in this case 1000. Now train the model to the training set created.

    Random Forest Regression¶

    Import the RandomForestRegression from Sklearn. Then call for the number of decision trees required, in this case 1000. Now train the model to the training set created.

    [55]:
    x
     
    from sklearn.ensemble import RandomForestRegressor
    rf = RandomForestRegressor(n_estimators = 1000, random_state = 42)
    rf.fit(train_df, train_labels);
    x
     
    Now make the predictions based on the testing data and calculate the mean absolute error

    Now make the predictions based on the testing data and calculate the mean absolute error

    [56]:
    x
     
    predictions = rf.predict(test_df)
    errors = abs(predictions - test_labels)
    print('Mean Absolute Error:', round(np.mean(errors), 2), 'degrees.')
    Mean Absolute Error: 27.58 degrees.
    
    xxxxxxxxxx
     
    Now find the accuracy of the predictions by calculating the mean absolute error percentage and subtracting the mean of the mape from 100. 

    Now find the accuracy of the predictions by calculating the mean absolute error percentage and subtracting the mean of the mape from 100.

    [57]:
    x
     
    mape = 100 * (errors / test_labels)
    accuracy = 100 - np.mean(mape)
    print('Accuracy:', round(accuracy, 2), '%.')
    Accuracy: 92.56 %.
    
    xxxxxxxxxx
     
    ### List of Importances
    ​
    Make a list of the variables and their importance in the prediction. The list is then sorted from most important to least. 

    List of Importances¶

    Make a list of the variables and their importance in the prediction. The list is then sorted from most important to least.

    [59]:
    x
     
    importances = list(rf.feature_importances_)
    feature_importances = [(df, round(importance, 2)) for df, importance in zip(df_list, importances)]
    feature_importances = sorted(feature_importances, key = lambda x: x[1], reverse = True)
    [print('Variable: {:20} Importance: {}'.format(*pair)) for pair in feature_importances];
    Variable: ANY/A                Importance: 0.24
    Variable: Rate                 Importance: 0.19
    Variable: EXP                  Importance: 0.09
    Variable: AY/A                 Importance: 0.06
    Variable: NY/A                 Importance: 0.04
    Variable: TD                   Importance: 0.03
    Variable: Attr                 Importance: 0.03
    Variable: Ydsr                 Importance: 0.03
    Variable: TDr                  Importance: 0.03
    Variable: RY/G                 Importance: 0.03
    Variable: aTD                  Importance: 0.03
    Variable: Cmp                  Importance: 0.01
    Variable: Cmp%                 Importance: 0.01
    Variable: Yds                  Importance: 0.01
    Variable: TD%                  Importance: 0.01
    Variable: Y/G                  Importance: 0.01
    Variable: Yds.1                Importance: 0.01
    Variable: QBHits               Importance: 0.01
    Variable: REXP                 Importance: 0.01
    Variable: aCmp                 Importance: 0.01
    Variable: aYds                 Importance: 0.01
    Variable: aDADOT               Importance: 0.01
    Variable: aYAC                 Importance: 0.01
    Variable: Bltz                 Importance: 0.01
    Variable: MTkl                 Importance: 0.01
    Variable: G                    Importance: 0.0
    Variable: Att                  Importance: 0.0
    Variable: Int                  Importance: 0.0
    Variable: PD                   Importance: 0.0
    Variable: Int%                 Importance: 0.0
    Variable: Y/A                  Importance: 0.0
    Variable: Y/C                  Importance: 0.0
    Variable: Sk                   Importance: 0.0
    Variable: TFL                  Importance: 0.0
    Variable: Sk%                  Importance: 0.0
    Variable: RY/A                 Importance: 0.0
    Variable: aAtt                 Importance: 0.0
    Variable: Air                  Importance: 0.0
    Variable: Hrry                 Importance: 0.0
    Variable: QBKD                 Importance: 0.0
    Variable: aSk                  Importance: 0.0
    Variable: Prss                 Importance: 0.0
    Variable: PAA                  Importance: 0.0
    Variable: Tm_Arizona Cardinals Importance: 0.0
    Variable: Tm_Arizona Cardinals19 Importance: 0.0
    Variable: Tm_Arizona Cardinals20 Importance: 0.0
    Variable: Tm_Atlanta Falcons   Importance: 0.0
    Variable: Tm_Atlanta Falcons19 Importance: 0.0
    Variable: Tm_Atlanta Falcons20 Importance: 0.0
    Variable: Tm_Baltimore Ravens  Importance: 0.0
    Variable: Tm_Baltimore Ravens19 Importance: 0.0
    Variable: Tm_Baltimore Ravens20 Importance: 0.0
    Variable: Tm_Buffalo Bills     Importance: 0.0
    Variable: Tm_Buffalo Bills19   Importance: 0.0
    Variable: Tm_Buffalo Bills20   Importance: 0.0
    Variable: Tm_Carolina Panthers Importance: 0.0
    Variable: Tm_Carolina Panthers19 Importance: 0.0
    Variable: Tm_Carolina Panthers20 Importance: 0.0
    Variable: Tm_Chicago Bears     Importance: 0.0
    Variable: Tm_Chicago Bears19   Importance: 0.0
    Variable: Tm_Chicago Bears20   Importance: 0.0
    Variable: Tm_Cincinnati Bengals Importance: 0.0
    Variable: Tm_Cincinnati Bengals19 Importance: 0.0
    Variable: Tm_Cincinnati Bengals20 Importance: 0.0
    Variable: Tm_Cleveland Browns  Importance: 0.0
    Variable: Tm_Cleveland Browns19 Importance: 0.0
    Variable: Tm_Cleveland Browns20 Importance: 0.0
    Variable: Tm_Dallas Cowboys    Importance: 0.0
    Variable: Tm_Dallas Cowboys19  Importance: 0.0
    Variable: Tm_Dallas Cowboys20  Importance: 0.0
    Variable: Tm_Denver Broncos    Importance: 0.0
    Variable: Tm_Denver Broncos19  Importance: 0.0
    Variable: Tm_Denver Broncos20  Importance: 0.0
    Variable: Tm_Detroit Lions     Importance: 0.0
    Variable: Tm_Detroit Lions19   Importance: 0.0
    Variable: Tm_Detroit Lions20   Importance: 0.0
    Variable: Tm_Green Bay Packers Importance: 0.0
    Variable: Tm_Green Bay Packers19 Importance: 0.0
    Variable: Tm_Green Bay Packers20 Importance: 0.0
    Variable: Tm_Houston Texans    Importance: 0.0
    Variable: Tm_Houston Texans19  Importance: 0.0
    Variable: Tm_Houston Texans20  Importance: 0.0
    Variable: Tm_Indianapolis Colts Importance: 0.0
    Variable: Tm_Indianapolis Colts19 Importance: 0.0
    Variable: Tm_Indianapolis Colts20 Importance: 0.0
    Variable: Tm_Jacksonville Jaguars Importance: 0.0
    Variable: Tm_Jacksonville Jaguars19 Importance: 0.0
    Variable: Tm_Jacksonville Jaguars20 Importance: 0.0
    Variable: Tm_Kansas City Chiefs Importance: 0.0
    Variable: Tm_Kansas City Chiefs19 Importance: 0.0
    Variable: Tm_Kansas City Chiefs20 Importance: 0.0
    Variable: Tm_Las Vegas Raiders Importance: 0.0
    Variable: Tm_Las Vegas Raiders20 Importance: 0.0
    Variable: Tm_Los Angeles Chargers Importance: 0.0
    Variable: Tm_Los Angeles Chargers19 Importance: 0.0
    Variable: Tm_Los Angeles Chargers20 Importance: 0.0
    Variable: Tm_Los Angeles Rams  Importance: 0.0
    Variable: Tm_Los Angeles Rams19 Importance: 0.0
    Variable: Tm_Los Angeles Rams20 Importance: 0.0
    Variable: Tm_Miami Dolphins    Importance: 0.0
    Variable: Tm_Miami Dolphins19  Importance: 0.0
    Variable: Tm_Miami Dolphins20  Importance: 0.0
    Variable: Tm_Minnesota Vikings Importance: 0.0
    Variable: Tm_Minnesota Vikings19 Importance: 0.0
    Variable: Tm_Minnesota Vikings20 Importance: 0.0
    Variable: Tm_New England Patriots Importance: 0.0
    Variable: Tm_New England Patriots19 Importance: 0.0
    Variable: Tm_New England Patriots20 Importance: 0.0
    Variable: Tm_New Orleans Saints Importance: 0.0
    Variable: Tm_New Orleans Saints19 Importance: 0.0
    Variable: Tm_New Orleans Saints20 Importance: 0.0
    Variable: Tm_New York Giants   Importance: 0.0
    Variable: Tm_New York Giants19 Importance: 0.0
    Variable: Tm_New York Giants20 Importance: 0.0
    Variable: Tm_New York Jets     Importance: 0.0
    Variable: Tm_New York Jets19   Importance: 0.0
    Variable: Tm_New York Jets20   Importance: 0.0
    Variable: Tm_Oakland Raiders19 Importance: 0.0
    Variable: Tm_Philadelphia Eagles Importance: 0.0
    Variable: Tm_Philadelphia Eagles19 Importance: 0.0
    Variable: Tm_Philadelphia Eagles20 Importance: 0.0
    Variable: Tm_Pittsburgh Steelers Importance: 0.0
    Variable: Tm_Pittsburgh Steelers19 Importance: 0.0
    Variable: Tm_Pittsburgh Steelers20 Importance: 0.0
    Variable: Tm_San Francisco 49ers Importance: 0.0
    Variable: Tm_San Francisco 49ers19 Importance: 0.0
    Variable: Tm_San Francisco 49ers20 Importance: 0.0
    Variable: Tm_Seattle Seahawks  Importance: 0.0
    Variable: Tm_Seattle Seahawks19 Importance: 0.0
    Variable: Tm_Seattle Seahawks20 Importance: 0.0
    Variable: Tm_Tampa Bay Buccaneers Importance: 0.0
    Variable: Tm_Tampa Bay Buccaneers19 Importance: 0.0
    Variable: Tm_Tampa Bay Buccaneers20 Importance: 0.0
    Variable: Tm_Tennessee Titans  Importance: 0.0
    Variable: Tm_Tennessee Titans19 Importance: 0.0
    Variable: Tm_Tennessee Titans20 Importance: 0.0
    Variable: Tm_Washington Football Team Importance: 0.0
    Variable: Tm_Washington Football Team20 Importance: 0.0
    Variable: Tm_Washington Redskins19 Importance: 0.0
    Variable: Bltz%_12.10%         Importance: 0.0
    Variable: Bltz%_13.70%         Importance: 0.0
    Variable: Bltz%_16.30%         Importance: 0.0
    Variable: Bltz%_16.40%         Importance: 0.0
    Variable: Bltz%_17.10%         Importance: 0.0
    Variable: Bltz%_17.50%         Importance: 0.0
    Variable: Bltz%_18.00%         Importance: 0.0
    Variable: Bltz%_19.50%         Importance: 0.0
    Variable: Bltz%_19.80%         Importance: 0.0
    Variable: Bltz%_19.90%         Importance: 0.0
    Variable: Bltz%_20.20%         Importance: 0.0
    Variable: Bltz%_20.50%         Importance: 0.0
    Variable: Bltz%_20.90%         Importance: 0.0
    Variable: Bltz%_21.30%         Importance: 0.0
    Variable: Bltz%_21.40%         Importance: 0.0
    Variable: Bltz%_22.00%         Importance: 0.0
    Variable: Bltz%_22.10%         Importance: 0.0
    Variable: Bltz%_22.20%         Importance: 0.0
    Variable: Bltz%_22.40%         Importance: 0.0
    Variable: Bltz%_22.60%         Importance: 0.0
    Variable: Bltz%_22.70%         Importance: 0.0
    Variable: Bltz%_22.80%         Importance: 0.0
    Variable: Bltz%_22.90%         Importance: 0.0
    Variable: Bltz%_23.20%         Importance: 0.0
    Variable: Bltz%_23.30%         Importance: 0.0
    Variable: Bltz%_23.50%         Importance: 0.0
    Variable: Bltz%_23.70%         Importance: 0.0
    Variable: Bltz%_23.90%         Importance: 0.0
    Variable: Bltz%_24.00%         Importance: 0.0
    Variable: Bltz%_24.10%         Importance: 0.0
    Variable: Bltz%_24.40%         Importance: 0.0
    Variable: Bltz%_24.50%         Importance: 0.0
    Variable: Bltz%_24.60%         Importance: 0.0
    Variable: Bltz%_24.70%         Importance: 0.0
    Variable: Bltz%_24.80%         Importance: 0.0
    Variable: Bltz%_24.90%         Importance: 0.0
    Variable: Bltz%_25.00%         Importance: 0.0
    Variable: Bltz%_25.10%         Importance: 0.0
    Variable: Bltz%_25.30%         Importance: 0.0
    Variable: Bltz%_26.00%         Importance: 0.0
    Variable: Bltz%_26.10%         Importance: 0.0
    Variable: Bltz%_26.60%         Importance: 0.0
    Variable: Bltz%_26.80%         Importance: 0.0
    Variable: Bltz%_26.90%         Importance: 0.0
    Variable: Bltz%_27.10%         Importance: 0.0
    Variable: Bltz%_27.30%         Importance: 0.0
    Variable: Bltz%_27.90%         Importance: 0.0
    Variable: Bltz%_28.00%         Importance: 0.0
    Variable: Bltz%_28.10%         Importance: 0.0
    Variable: Bltz%_28.50%         Importance: 0.0
    Variable: Bltz%_28.70%         Importance: 0.0
    Variable: Bltz%_29.10%         Importance: 0.0
    Variable: Bltz%_29.80%         Importance: 0.0
    Variable: Bltz%_31.00%         Importance: 0.0
    Variable: Bltz%_31.10%         Importance: 0.0
    Variable: Bltz%_31.50%         Importance: 0.0
    Variable: Bltz%_31.60%         Importance: 0.0
    Variable: Bltz%_31.80%         Importance: 0.0
    Variable: Bltz%_32.40%         Importance: 0.0
    Variable: Bltz%_32.50%         Importance: 0.0
    Variable: Bltz%_32.70%         Importance: 0.0
    Variable: Bltz%_32.90%         Importance: 0.0
    Variable: Bltz%_33.50%         Importance: 0.0
    Variable: Bltz%_33.60%         Importance: 0.0
    Variable: Bltz%_33.70%         Importance: 0.0
    Variable: Bltz%_35.70%         Importance: 0.0
    Variable: Bltz%_35.80%         Importance: 0.0
    Variable: Bltz%_35.90%         Importance: 0.0
    Variable: Bltz%_36.90%         Importance: 0.0
    Variable: Bltz%_37.10%         Importance: 0.0
    Variable: Bltz%_38.20%         Importance: 0.0
    Variable: Bltz%_38.40%         Importance: 0.0
    Variable: Bltz%_39.00%         Importance: 0.0
    Variable: Bltz%_39.20%         Importance: 0.0
    Variable: Bltz%_39.40%         Importance: 0.0
    Variable: Bltz%_39.60%         Importance: 0.0
    Variable: Bltz%_39.70%         Importance: 0.0
    Variable: Bltz%_40.30%         Importance: 0.0
    Variable: Bltz%_40.80%         Importance: 0.0
    Variable: Bltz%_43.40%         Importance: 0.0
    Variable: Bltz%_44.10%         Importance: 0.0
    Variable: Bltz%_54.90%         Importance: 0.0
    Variable: Hrry%_10.00%         Importance: 0.0
    Variable: Hrry%_10.20%         Importance: 0.0
    Variable: Hrry%_10.40%         Importance: 0.0
    Variable: Hrry%_10.50%         Importance: 0.0
    Variable: Hrry%_10.60%         Importance: 0.0
    Variable: Hrry%_10.70%         Importance: 0.0
    Variable: Hrry%_10.80%         Importance: 0.0
    Variable: Hrry%_10.90%         Importance: 0.0
    Variable: Hrry%_11.10%         Importance: 0.0
    Variable: Hrry%_11.20%         Importance: 0.0
    Variable: Hrry%_11.30%         Importance: 0.0
    Variable: Hrry%_11.50%         Importance: 0.0
    Variable: Hrry%_11.60%         Importance: 0.0
    Variable: Hrry%_11.80%         Importance: 0.0
    Variable: Hrry%_12.10%         Importance: 0.0
    Variable: Hrry%_12.20%         Importance: 0.0
    Variable: Hrry%_12.40%         Importance: 0.0
    Variable: Hrry%_12.50%         Importance: 0.0
    Variable: Hrry%_12.70%         Importance: 0.0
    Variable: Hrry%_12.90%         Importance: 0.0
    Variable: Hrry%_13.00%         Importance: 0.0
    Variable: Hrry%_13.10%         Importance: 0.0
    Variable: Hrry%_13.60%         Importance: 0.0
    Variable: Hrry%_14.30%         Importance: 0.0
    Variable: Hrry%_14.50%         Importance: 0.0
    Variable: Hrry%_14.70%         Importance: 0.0
    Variable: Hrry%_15.40%         Importance: 0.0
    Variable: Hrry%_5.60%          Importance: 0.0
    Variable: Hrry%_6.10%          Importance: 0.0
    Variable: Hrry%_6.70%          Importance: 0.0
    Variable: Hrry%_6.80%          Importance: 0.0
    Variable: Hrry%_7.10%          Importance: 0.0
    Variable: Hrry%_7.20%          Importance: 0.0
    Variable: Hrry%_7.30%          Importance: 0.0
    Variable: Hrry%_7.50%          Importance: 0.0
    Variable: Hrry%_7.60%          Importance: 0.0
    Variable: Hrry%_7.70%          Importance: 0.0
    Variable: Hrry%_7.80%          Importance: 0.0
    Variable: Hrry%_7.90%          Importance: 0.0
    Variable: Hrry%_8.00%          Importance: 0.0
    Variable: Hrry%_8.10%          Importance: 0.0
    Variable: Hrry%_8.20%          Importance: 0.0
    Variable: Hrry%_8.30%          Importance: 0.0
    Variable: Hrry%_8.40%          Importance: 0.0
    Variable: Hrry%_8.50%          Importance: 0.0
    Variable: Hrry%_8.60%          Importance: 0.0
    Variable: Hrry%_8.70%          Importance: 0.0
    Variable: Hrry%_8.80%          Importance: 0.0
    Variable: Hrry%_8.90%          Importance: 0.0
    Variable: Hrry%_9.00%          Importance: 0.0
    Variable: Hrry%_9.10%          Importance: 0.0
    Variable: Hrry%_9.20%          Importance: 0.0
    Variable: Hrry%_9.30%          Importance: 0.0
    Variable: Hrry%_9.40%          Importance: 0.0
    Variable: Hrry%_9.60%          Importance: 0.0
    Variable: Hrry%_9.70%          Importance: 0.0
    Variable: Hrry%_9.80%          Importance: 0.0
    Variable: QBKD%_10.00%         Importance: 0.0
    Variable: QBKD%_10.10%         Importance: 0.0
    Variable: QBKD%_10.20%         Importance: 0.0
    Variable: QBKD%_10.70%         Importance: 0.0
    Variable: QBKD%_10.80%         Importance: 0.0
    Variable: QBKD%_10.90%         Importance: 0.0
    Variable: QBKD%_11.00%         Importance: 0.0
    Variable: QBKD%_11.10%         Importance: 0.0
    Variable: QBKD%_11.20%         Importance: 0.0
    Variable: QBKD%_11.30%         Importance: 0.0
    Variable: QBKD%_11.70%         Importance: 0.0
    Variable: QBKD%_11.80%         Importance: 0.0
    Variable: QBKD%_12.00%         Importance: 0.0
    Variable: QBKD%_12.70%         Importance: 0.0
    Variable: QBKD%_12.90%         Importance: 0.0
    Variable: QBKD%_15.20%         Importance: 0.0
    Variable: QBKD%_4.50%          Importance: 0.0
    Variable: QBKD%_5.10%          Importance: 0.0
    Variable: QBKD%_5.30%          Importance: 0.0
    Variable: QBKD%_5.40%          Importance: 0.0
    Variable: QBKD%_5.90%          Importance: 0.0
    Variable: QBKD%_6.00%          Importance: 0.0
    Variable: QBKD%_6.20%          Importance: 0.0
    Variable: QBKD%_6.30%          Importance: 0.0
    Variable: QBKD%_6.40%          Importance: 0.0
    Variable: QBKD%_6.50%          Importance: 0.0
    Variable: QBKD%_6.60%          Importance: 0.0
    Variable: QBKD%_6.70%          Importance: 0.0
    Variable: QBKD%_6.80%          Importance: 0.0
    Variable: QBKD%_6.90%          Importance: 0.0
    Variable: QBKD%_7.00%          Importance: 0.0
    Variable: QBKD%_7.10%          Importance: 0.0
    Variable: QBKD%_7.20%          Importance: 0.0
    Variable: QBKD%_7.30%          Importance: 0.0
    Variable: QBKD%_7.40%          Importance: 0.0
    Variable: QBKD%_7.50%          Importance: 0.0
    Variable: QBKD%_7.60%          Importance: 0.0
    Variable: QBKD%_7.80%          Importance: 0.0
    Variable: QBKD%_7.90%          Importance: 0.0
    Variable: QBKD%_8.00%          Importance: 0.0
    Variable: QBKD%_8.10%          Importance: 0.0
    Variable: QBKD%_8.30%          Importance: 0.0
    Variable: QBKD%_8.40%          Importance: 0.0
    Variable: QBKD%_8.50%          Importance: 0.0
    Variable: QBKD%_8.60%          Importance: 0.0
    Variable: QBKD%_8.70%          Importance: 0.0
    Variable: QBKD%_8.80%          Importance: 0.0
    Variable: QBKD%_8.90%          Importance: 0.0
    Variable: QBKD%_9.00%          Importance: 0.0
    Variable: QBKD%_9.20%          Importance: 0.0
    Variable: QBKD%_9.30%          Importance: 0.0
    Variable: QBKD%_9.50%          Importance: 0.0
    Variable: QBKD%_9.60%          Importance: 0.0
    Variable: QBKD%_9.70%          Importance: 0.0
    Variable: QBKD%_9.80%          Importance: 0.0
    Variable: QBKD%_9.90%          Importance: 0.0
    Variable: Prss%_16.50%         Importance: 0.0
    Variable: Prss%_16.70%         Importance: 0.0
    Variable: Prss%_17.50%         Importance: 0.0
    Variable: Prss%_17.60%         Importance: 0.0
    Variable: Prss%_18.10%         Importance: 0.0
    Variable: Prss%_18.40%         Importance: 0.0
    Variable: Prss%_18.80%         Importance: 0.0
    Variable: Prss%_19.00%         Importance: 0.0
    Variable: Prss%_19.30%         Importance: 0.0
    Variable: Prss%_19.60%         Importance: 0.0
    Variable: Prss%_19.90%         Importance: 0.0
    Variable: Prss%_20.10%         Importance: 0.0
    Variable: Prss%_20.20%         Importance: 0.0
    Variable: Prss%_20.50%         Importance: 0.0
    Variable: Prss%_20.70%         Importance: 0.0
    Variable: Prss%_21.10%         Importance: 0.0
    Variable: Prss%_21.30%         Importance: 0.0
    Variable: Prss%_21.40%         Importance: 0.0
    Variable: Prss%_21.50%         Importance: 0.0
    Variable: Prss%_21.70%         Importance: 0.0
    Variable: Prss%_21.80%         Importance: 0.0
    Variable: Prss%_21.90%         Importance: 0.0
    Variable: Prss%_22.10%         Importance: 0.0
    Variable: Prss%_22.20%         Importance: 0.0
    Variable: Prss%_22.40%         Importance: 0.0
    Variable: Prss%_22.60%         Importance: 0.0
    Variable: Prss%_22.80%         Importance: 0.0
    Variable: Prss%_22.90%         Importance: 0.0
    Variable: Prss%_23.00%         Importance: 0.0
    Variable: Prss%_23.10%         Importance: 0.0
    Variable: Prss%_23.30%         Importance: 0.0
    Variable: Prss%_23.40%         Importance: 0.0
    Variable: Prss%_23.50%         Importance: 0.0
    Variable: Prss%_23.60%         Importance: 0.0
    Variable: Prss%_23.70%         Importance: 0.0
    Variable: Prss%_23.80%         Importance: 0.0
    Variable: Prss%_23.90%         Importance: 0.0
    Variable: Prss%_24.00%         Importance: 0.0
    Variable: Prss%_24.10%         Importance: 0.0
    Variable: Prss%_24.20%         Importance: 0.0
    Variable: Prss%_24.30%         Importance: 0.0
    Variable: Prss%_24.40%         Importance: 0.0
    Variable: Prss%_24.50%         Importance: 0.0
    Variable: Prss%_24.60%         Importance: 0.0
    Variable: Prss%_24.70%         Importance: 0.0
    Variable: Prss%_24.80%         Importance: 0.0
    Variable: Prss%_25.00%         Importance: 0.0
    Variable: Prss%_25.20%         Importance: 0.0
    Variable: Prss%_25.50%         Importance: 0.0
    Variable: Prss%_25.60%         Importance: 0.0
    Variable: Prss%_25.80%         Importance: 0.0
    Variable: Prss%_25.90%         Importance: 0.0
    Variable: Prss%_26.10%         Importance: 0.0
    Variable: Prss%_26.20%         Importance: 0.0
    Variable: Prss%_26.30%         Importance: 0.0
    Variable: Prss%_26.40%         Importance: 0.0
    Variable: Prss%_26.80%         Importance: 0.0
    Variable: Prss%_27.50%         Importance: 0.0
    Variable: Prss%_27.60%         Importance: 0.0
    Variable: Prss%_27.90%         Importance: 0.0
    Variable: Prss%_28.50%         Importance: 0.0
    Variable: Prss%_28.60%         Importance: 0.0
    Variable: Prss%_28.70%         Importance: 0.0
    Variable: Prss%_30.50%         Importance: 0.0
    Variable: Prss%_30.80%         Importance: 0.0
    Variable: Prss%_35.10%         Importance: 0.0
    
    xxxxxxxxxx
     
    ​

    Type Markdown and LaTeX: α2α2

    [ ]:
     
    ​
    xxxxxxxxxx
     
    # Random Forest Regression of the defence of NFL teams through the 2018-21 seasons 
    ​
    ​
    All modules and libraries imported. csv containing raw data from ProFootballReference.com is also imported and read by the pandas read function. This is defined
    as df.

    Random Forest Regression of the defence of NFL teams through the 2018-21 seasons¶

    All modules and libraries imported. csv containing raw data from ProFootballReference.com is also imported and read by the pandas read function. This is defined as df.

    [205]:
    xxxxxxxxxx
     
    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    from sklearn import linear_model
    from sklearn.linear_model import LinearRegression
    from sklearn.metrics import mean_squared_error, r2_score
    import statsmodels.api as sm
    from statsmodels.formula.api import ols
    from statsmodels.stats.anova import anova_lm
    from statsmodels.graphics.factorplots import interaction_plot
    from scipy import stats
    import matplotlib.colors as mcolors
    from scipy.stats import rankdata
    import seaborn as sns
    df = pd.read_csv (r'C:\Users\Rob\Documents\Prediction.csv')
    print (df)
                        Teams  G  Pts  Prediction  Prediction/G  Pts/G  P/G*17
    0       Arizona Cardinals  7  156      419.10         24.65   22.3  378.86
    1         Atlanta Falcons  7  163      354.03         20.83   23.3  395.86
    2        Baltimore Ravens  7  181      417.33         24.55   25.9  439.57
    3           Buffalo Bills  6  176      507.74         29.87   29.3  498.67
    4       Carolina Panthers  7  124      257.84         15.17   17.7  301.14
    5           Chicago Bears  6   93      276.56         16.27   15.5  263.50
    6      Cincinnati Bengals  7  173      433.40         25.49   24.7  420.14
    7        Cleveland Browns  7  168      464.37         27.32   24.0  408.00
    8          Dallas Cowboys  7  134      319.45         18.79   19.1  325.43
    9          Denver Broncos  7  100      339.80         19.99   14.3  242.86
    10          Detroit Lions  6  146      347.52         20.44   24.3  413.67
    11      Green Bay Packers  7  128      375.83         22.11   18.3  310.86
    12         Houston Texans  6  106      346.00         20.35   17.7  300.33
    13     Indianapolis Colts  7  113      397.78         23.40   16.1  274.43
    14   Jacksonville Jaguars  7  155      417.01         24.53   22.1  376.43
    15     Kansas City Chiefs  7  223      462.52         27.21   31.9  541.57
    16      Las Vegas Raiders  6  163      443.20         26.07   27.2  461.83
    17   Los Angeles Chargers  7  164      429.25         25.25   23.4  398.29
    18       Los Angeles Rams  6  104      336.85         19.81   17.3  294.67
    19         Miami Dolphins  7  147      420.28         24.72   21.0  357.00
    20      Minnesota Vikings  6  139      389.33         22.90   23.2  393.83
    21   New England Patriots  6  141      404.85         23.81   23.5  399.50
    22     New Orleans Saints  7  175      465.84         27.40   25.0  425.00
    23        New York Giants  7  150      391.79         23.05   21.4  364.29
    24          New York Jets  7  159      421.97         24.82   22.7  386.14
    25    Philadelphia Eagles  6  161      432.22         25.42   26.8  456.17
    26    Pittsburgh Steelers  7  107      318.67         18.75   15.3  259.86
    27    San Francisco 49ers  7  145      412.43         24.26   20.7  352.14
    28       Seattle Seahawks  7  183      396.31         23.31   26.1  444.43
    29   Tampa Bay Buccaneers  7  124      383.57         22.56   17.7  301.14
    30       Tennessee Titans  6  115      275.78         16.22   19.2  279.29
    31  Washington Commanders  7  125      372.02         21.88   17.9  303.57
    
    xxxxxxxxxx
     
    ### Scatter Graph
    ​
    Scatter points scored per game and predicted points scored per game

    Scatter Graph¶

    Scatter points scored per game and predicted points scored per game

    [214]:
     
    plt.scatter(df['Pts/G'], df['Prediction/G'], c = df['Prediction/G'], cmap = 'Reds')
    plt.xlabel('Pts/G')
    plt.show()
    xxxxxxxxxx
     
    Define the columns to make the scripting quicker.

    Define the columns to make the scripting quicker.

    [116]:
     
    pg17 = df['P/G*17']
    pg17 = np.array(pg17).reshape((-1,1))
    pre = df['Prediction']
    preg = df['Prediction/G']
    preg = np.array(preg).reshape(-1,1)
    pg = df['Pts/G']
    xxxxxxxxxx
     
    ### Linear Regression
    ​
    Create and perform a linear regression on the per game data

    Linear Regression¶

    Create and perform a linear regression on the per game data

    [117]:
     
    model = LinearRegression()
    model = LinearRegression().fit(preg, pg)
    ​
    r_sq = model.score(preg, pg)
    print(f"coefficient of determination: {r_sq}")
    print(f"intercept: {model.intercept_}")
    print(f"slope: {model.coef_}")
    coefficient of determination: 0.5473165407302725
    intercept: 0.7686812178132065
    slope: [0.91671526]
    
    [208]:
     
    yr_pro = df['Pts/G']*17
    preg = df['Prediction']/17
    xxxxxxxxxx

    Projection and Prediction Comparison¶

    First display the difference between the prediction and the projected points allowed. In this model it is appropriate to do this for both full year and per game data to test that they share the same correlation coefficient. The per game data is more valuable as it makes it easier to identify how diferent the predictions and projections are as points in the NFL are often 3 or 7. the numbers 2 ,6 and 8 can also occur but are less common and situational which cant be controlled for by this model. The sum of all and squared differences should be provided also.

    [209]:
     
    print("difference:", pre - yr_pro)
    print("SAD:", np.sum(np.abs(pre - yr_pro)))
    print("SSD:", np.sum(np.square(pre - yr_pro)))
    print("correlation:", np.corrcoef(np.array((pre, yr_pro)))[0, 1])
    difference: 0      40.00
    1     -42.07
    2     -22.97
    3       9.64
    4     -43.06
    5      13.06
    6      13.50
    7      56.37
    8      -5.25
    9      96.70
    10    -65.58
    11     64.73
    12     45.10
    13    124.08
    14     41.31
    15    -79.78
    16    -19.20
    17     31.45
    18     42.75
    19     63.28
    20     -5.07
    21      5.35
    22     40.84
    23     27.99
    24     36.07
    25    -23.38
    26     58.57
    27     60.53
    28    -47.39
    29     82.67
    30    -50.62
    31     67.72
    dtype: float64
    SAD: 1426.0800000000002
    SSD: 87611.2276
    correlation: 0.7398088722661882
    
    [210]:
     
    print("difference:", preg - pg)
    print("Sum All Differences:", np.sum(np.abs(preg - pg)))
    print("Sum Squared Differences:", np.sum(np.square(preg - pg)))
    print("correlation:", np.corrcoef(np.array((preg, pg)))[0, 1])
    difference: 0     2.352941
    1    -2.474706
    2    -1.351176
    3     0.567059
    4    -2.532941
    5     0.768235
    6     0.794118
    7     3.315882
    8    -0.308824
    9     5.688235
    10   -3.857647
    11    3.807647
    12    2.652941
    13    7.298824
    14    2.430000
    15   -4.692941
    16   -1.129412
    17    1.850000
    18    2.514706
    19    3.722353
    20   -0.298235
    21    0.314706
    22    2.402353
    23    1.646471
    24    2.121765
    25   -1.375294
    26    3.445294
    27    3.560588
    28   -2.787647
    29    4.862941
    30   -2.977647
    31    3.983529
    dtype: float64
    Sum All Differences: 83.88705882352943
    Sum Squared Differences: 303.1530366782007
    correlation: 0.7398088722661882
    
    xxxxxxxxxx
     
    ### New Dataframe
    ​
    Make a new dataframe consisting of the the teams and their corresponding prediction/game subtracted by the points/game so far. 

    New Dataframe¶

    Make a new dataframe consisting of the the teams and their corresponding prediction/game subtracted by the points/game so far.

    [212]:
     
    Teams = ('Atlanta Falcons', 'Buffalo Bills', 'Carolina Panthers', 'Chicago Bears', 'Cincinnati Bengals', 'Cleveland Browns', 'Indianapolis Colts', 'Arizona Cardinals', 'Dallas Cowboys', 'Denver Broncos', 'Detroit Lions', 'Green Bay Packers', 'Houston Texans', 'Jacksonville Jaguars', 'Kansas City Chiefs', 'Miami Dolphins', 'Minnesota Vikings', 'New Orleans Saints', 'New England Patriots', 'New York Giants', 'New York Jets', 'Tennessee Titans', 'Philadelphia Eagles', 'Pittsburgh Steelers', 'Las Vegas Raiders', 'Los Angeles Rams', 'Baltimore Ravens', 'Los Angeles Chargers', 'Seattle Seahawks', 'San Francisco 49ers', 'Tampa Bay Buccaneers', 'Washington Commanders')
    df1 = pd.DataFrame(columns=['Teams'])
    df1['Teams'] = Teams
    diff = preg - pg
    df1['P/G O Diff'] = diff
    df1_sorted = df1.sort_values('Teams').reset_index(drop=True)
    df1_sorted
    [212]:
    Teams P/G O Diff
    0 Arizona Cardinals 3.315882
    1 Atlanta Falcons 2.352941
    2 Baltimore Ravens 3.445294
    3 Buffalo Bills -2.474706
    4 Carolina Panthers -1.351176
    5 Chicago Bears 0.567059
    6 Cincinnati Bengals -2.532941
    7 Cleveland Browns 0.768235
    8 Dallas Cowboys -0.308824
    9 Denver Broncos 5.688235
    10 Detroit Lions -3.857647
    11 Green Bay Packers 3.807647
    12 Houston Texans 2.652941
    13 Indianapolis Colts 0.794118
    14 Jacksonville Jaguars 7.298824
    15 Kansas City Chiefs 2.430000
    16 Las Vegas Raiders 2.121765
    17 Los Angeles Chargers 3.560588
    18 Los Angeles Rams -1.375294
    19 Miami Dolphins -4.692941
    20 Minnesota Vikings -1.129412
    21 New England Patriots 2.514706
    22 New Orleans Saints 1.850000
    23 New York Giants 3.722353
    24 New York Jets -0.298235
    25 Philadelphia Eagles 2.402353
    26 Pittsburgh Steelers 1.646471
    27 San Francisco 49ers 4.862941
    28 Seattle Seahawks -2.787647
    29 Tampa Bay Buccaneers -2.977647
    30 Tennessee Titans 0.314706
    31 Washington Commanders 3.983529
    xxxxxxxxxx
     
    ### Barchart
    ​
    plot a bar chart from the data in the new dataframe(team name and their prediction/game - points/game)

    Barchart¶

    plot a bar chart from the data in the new dataframe(team name and their prediction/game - points/game)

    [213]:
     
    plt.figure(figsize=(10,5))
    plt.bar(df1_sorted['Teams'], df1_sorted['P/G O Diff'],color = ['firebrick','darkred','purple','blue','cyan', 'orange', 'brown','darkorange', 'midnightblue', 'darkorange', 'cornflowerblue', 'forestgreen', 'darkblue', 'blue', 'turquoise', 'red','black','dodgerblue','blue','aqua', 'darkviolet','midnightblue','black', 'blue','darkgreen','lime','black','darkred','darkslategray','darkred','darkturquoise','maroon' ])
    plt.axhline(y = 0, color = 'black', linestyle = '-')
    plt.xticks(rotation=90)
    plt.show
    [213]:
    <function matplotlib.pyplot.show(close=None, block=None)>
    xxxxxxxxxx
     
    ### Import Dataframe
    ​
    Import the dataframe showing the teams and their points allowed differential created previously. 

    Import Dataframe¶

    Import the dataframe showing the teams and their points allowed differential created previously.

    [123]:
     
    df2 = pd.read_csv (r'C:\Users\Rob\Documents\DPredDiff.csv')
    df2
    [123]:
    Teams P/G Diff
    0 Arizona Cardinals 1.346471
    1 Atlanta Falcons -0.131765
    2 Baltimore Ravens 1.464118
    3 Buffalo Bills -0.903529
    4 Carolina Panthers -1.268235
    5 Chicago Bears -2.062941
    6 Cincinnati Bengals -0.714706
    7 Cleveland Browns 1.087059
    8 Dallas Cowboys -3.148824
    9 Denver Broncos -1.874706
    10 Detroit Lions 4.165882
    11 Green Bay Packers -3.542353
    12 Houston Texans -1.117059
    13 Indianapolis Colts -3.470000
    14 Jacksonville Jaguars -1.446471
    15 Kansas City Chiefs 0.705882
    16 Las Vegas Raiders -0.735882
    17 Los Angeles Chargers 3.228235
    18 Los Angeles Rams 1.301765
    19 Miami Dolphins -0.456471
    20 Minnesota Vikings -4.195882
    21 New England Patriots -0.597059
    22 New Orleans Saints 3.064706
    23 New York Giants -5.836471
    24 New York Jets 1.105294
    25 Philadelphia Eagles 2.311765
    26 Pittsburgh Steelers -0.198235
    27 San Francisco 49ers -0.297647
    28 Seattle Seahawks 0.252941
    29 Tampa Bay Buccaneers -2.858824
    30 Tennessee Titans -2.528824
    31 Washington Commanders -2.844706
    xxxxxxxxxx
     
    Compile the data from both dataframes

    Compile the data from both dataframes

    [124]:
     
    ddiff = df2['P/G Diff']
    df1_sorted['P/G D Diff'] = ddiff
    df1_sorted
    [124]:
    Teams P/G O Diff P/G D Diff
    0 Arizona Cardinals -3.315882 1.346471
    1 Atlanta Falcons -2.352941 -0.131765
    2 Baltimore Ravens -3.445294 1.464118
    3 Buffalo Bills 2.474706 -0.903529
    4 Carolina Panthers 1.351176 -1.268235
    5 Chicago Bears -0.567059 -2.062941
    6 Cincinnati Bengals 2.532941 -0.714706
    7 Cleveland Browns -0.768235 1.087059
    8 Dallas Cowboys 0.308824 -3.148824
    9 Denver Broncos -5.688235 -1.874706
    10 Detroit Lions 3.857647 4.165882
    11 Green Bay Packers -3.807647 -3.542353
    12 Houston Texans -2.652941 -1.117059
    13 Indianapolis Colts -0.794118 -3.470000
    14 Jacksonville Jaguars -7.298824 -1.446471
    15 Kansas City Chiefs -2.430000 0.705882
    16 Las Vegas Raiders -2.121765 -0.735882
    17 Los Angeles Chargers -3.560588 3.228235
    18 Los Angeles Rams 1.375294 1.301765
    19 Miami Dolphins 4.692941 -0.456471
    20 Minnesota Vikings 1.129412 -4.195882
    21 New England Patriots -2.514706 -0.597059
    22 New Orleans Saints -1.850000 3.064706
    23 New York Giants -3.722353 -5.836471
    24 New York Jets 0.298235 1.105294
    25 Philadelphia Eagles -2.402353 2.311765
    26 Pittsburgh Steelers -1.646471 -0.198235
    27 San Francisco 49ers -4.862941 -0.297647
    28 Seattle Seahawks 2.787647 0.252941
    29 Tampa Bay Buccaneers 2.977647 -2.858824
    30 Tennessee Titans -0.314706 -2.528824
    31 Washington Commanders -3.983529 -2.844706
    xxxxxxxxxx
     
    Define the variables

    Define the variables

    [132]:
     
    t = df1_sorted['Teams']
    o = df1_sorted['P/G O Diff']
    d = df1_sorted['P/G D Diff']
    xxxxxxxxxx
     
    ### Barchart
    ​
    plot a Barchart for all the data in the new dataframe so that every team has 2 plots showing their points scored and allowed differentials

    Barchart¶

    plot a Barchart for all the data in the new dataframe so that every team has 2 plots showing their points scored and allowed differentials

    [203]:
     
    plt.figure(figsize=(20,7))
    ​
    width = 0.3
    o_bar = np.arange(len(o)) 
    d_bar = [x + width for x in o_bar] 
     
    ​
    plt.bar(o_bar, o, color ='blue', width = width, edgecolor ='black', label ='Offense',align='edge') 
    plt.bar(d_bar, d, color ='red', width = width, edgecolor ='black', label ='Defense',align='edge') 
    ​
    plt.axhline(y = 0, color = 'black', linestyle = '-')
    ​
    plt.xlabel('Teams') 
    plt.ylabel('Point Differential') 
    plt.xticks([r + width for r in range(len(o))], t, rotation=(90)) 
    plt.legend()
    ​
    [203]:
    <matplotlib.legend.Legend at 0x1d0eec1a880>
    [ ]:
     
    ​
    [ ]:
     
    ​
    xxxxxxxxxx

    Random Forest Regression of the defence of NFL teams through the 2018-21 seasons¶

    All modules and libraries imported. csv containing raw data from ProFootballReference.com is also imported and read by the pandas read function. This is defined as df.

    [1]:
    xxxxxxxxxx
     
    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    from sklearn import linear_model
    from sklearn.linear_model import LinearRegression
    from sklearn.metrics import mean_squared_error, r2_score
    import statsmodels.api as sm
    from statsmodels.formula.api import ols
    from statsmodels.stats.anova import anova_lm
    from statsmodels.graphics.factorplots import interaction_plot
    from scipy import stats
    from scipy.stats import rankdata
    import seaborn as sns
    df = pd.read_csv (r'C:\Users\Rob\Documents\2021stats.csv')
    print (df)
    xxxxxxxxxx

    Finding the mean of all points allowed and added back to the df to use as the baseline later

    [2]:
     
    np.mean(df['PF'])
    [2]:
    384.1041666666667
    xxxxxxxxxx

    Checking that all the cells in the dataframe is filled

    [3]:
    xxxxxxxxxx
     
    df.describe()
    xxxxxxxxxx

    One-hot Encoding to turn the Categorical data into numbers to allow better analysis through the machine learning

    [4]:
    x
     
    df = pd.get_dummies(df)
    df.iloc[:,:].head()
    xxxxxxxxxx

    Choose what is to be predicted¶

    Define the desired variable to model for (predict) as labels. Then remove that column from the dataframe and return the dataframe as an array.

    [5]:
    x
     
    labels = np.array(df['PF'])
    df= df.drop('PF', axis = 1)
    df_list = list(df.columns)
    df = np.array(df)
    xxxxxxxxxx

    Import the train_test_split function from sklearn

    [6]:
     
    from sklearn.model_selection import train_test_split
    ​
    xxxxxxxxxx

    Create the training and testing splits. Also add the size of the testing data.

    [7]:
    x
     
    train_df, test_df, train_labels, test_labels = train_test_split(df, labels, test_size = 0.25, random_state = 42)
    xxxxxxxxxx

    Print the shapes of the splits created above to make sure there are no errors in them

    [8]:
     
    print('Training df Shape:', train_df.shape)
    print('Training Labels Shape:', train_labels.shape)
    print('Testing df Shape:', test_df.shape)
    print('Testing Labels Shape:', test_labels.shape)
    ​
    Training df Shape: (72, 116)
    Training Labels Shape: (72,)
    Testing df Shape: (24, 116)
    Testing Labels Shape: (24,)
    
    xxxxxxxxxx

    Baseline¶

    Use the PAA (Points Against Average) created earlier as the baseline prediction. Test for errors in the baseline compared to the testing data

    [16]:
    x
     
    baseline_preds = test_df[:, df_list.index('PFA')]
    baseline_errors = abs(baseline_preds - test_labels)
    print('Average baseline error: ', round(np.mean(baseline_errors), 2))
    Average baseline error:  53.57
    
    xxxxxxxxxx

    Random Forest Regression¶

    Import the RandomForestRegression from Sklearn. Then call for the number of decision trees required, in this case 1000. Now train the model to the training set created.

    [10]:
    x
     
    from sklearn.ensemble import RandomForestRegressor
    rf = RandomForestRegressor(n_estimators = 1000, random_state = 42)
    rf.fit(train_df, train_labels);
    xxxxxxxxxx

    Now make the predictions based on the testing data and calculate the mean absolute error

    [11]:
    x
     
    predictions = rf.predict(test_df)
    errors = abs(predictions - test_labels)
    print('Mean Absolute Error:', round(np.mean(errors), 2), 'degrees.')
    Mean Absolute Error: 29.02 degrees.
    
    xxxxxxxxxx

    Now find the accuracy of the predictions by calculating the mean absolute error percentage and subtracting the mean of the mape from 100.

    [12]:
    x
     
    mape = 100 * (errors / test_labels)
    accuracy = 100 - np.mean(mape)
    print('Accuracy:', round(accuracy, 2), '%.')
    Accuracy: 91.89 %.
    
    xxxxxxxxxx
     
    ### Displaying the Random Forest
    ​
    Convert the tree to a png file

    Displaying the Random Forest¶

    Convert the tree to a png file

    [13]:
    x
     
    from sklearn.tree import export_graphviz
    import pydot
    tree = rf.estimators_[5]
    from sklearn.tree import export_graphviz
    import pydot
    tree = rf.estimators_[5]
    export_graphviz(tree, out_file = 'tree.dot', feature_names = df_list, rounded = True, precision = 1)
    (graph, ) = pydot.graph_from_dot_file('tree.dot')
    graph.write_png('tree.png')
    xxxxxxxxxx
     
    Read and display the image of the random forest

    Read and display the image of the random forest

    [14]:
     
    import matplotlib.image as mpimg
    plt_1 = plt.figure(figsize=(100, 100))
    img = mpimg.imread('tree.png')
    imgplot = plt.imshow(img)
    plt.show()
    xxxxxxxxxx

    List of Importances¶

    Make a list of the variables and their importance in the prediction. The list is then sorted from most important to least.

    [15]:
     
    importances = list(rf.feature_importances_)
    # List of tuples with variable and importance
    feature_importances = [(df, round(importance, 2)) for df, importance in zip(df_list, importances)]
    # Sort the feature importances by most important first
    feature_importances = sorted(feature_importances, key = lambda x: x[1], reverse = True)
    # Print out the feature and importances 
    [print('Variable: {:20} Importance: {}'.format(*pair)) for pair in feature_importances];
    Variable: 1D                   Importance: 0.25
    Variable: Ydsp                 Importance: 0.22
    Variable: CAY/PA               Importance: 0.08
    Variable: Att                  Importance: 0.06
    Variable: Att/Br               Importance: 0.06
    Variable: Yds                  Importance: 0.05
    Variable: YBC                  Importance: 0.04
    Variable: BrkTkl               Importance: 0.03
    Variable: Cmp                  Importance: 0.03
    Variable: CAY                  Importance: 0.03
    Variable: YAC                  Importance: 0.02
    Variable: YAC/Att              Importance: 0.02
    Variable: Attp                 Importance: 0.02
    Variable: IAY                  Importance: 0.02
    Variable: YACp                 Importance: 0.02
    Variable: YBC/Att              Importance: 0.01
    Variable: IAY/PA               Importance: 0.01
    Variable: CAY/Cmp              Importance: 0.01
    Variable: YAC/Cmp              Importance: 0.01
    Variable: PFA                  Importance: 0.0
    Variable: Tm_Arizona Cardinals19 Importance: 0.0
    Variable: Tm_Arizona Cardinals20 Importance: 0.0
    Variable: Tm_Arizona Cardinals21 Importance: 0.0
    Variable: Tm_Atlanta Falcons19 Importance: 0.0
    Variable: Tm_Atlanta Falcons20 Importance: 0.0
    Variable: Tm_Atlanta Falcons21 Importance: 0.0
    Variable: Tm_Baltimore Ravens19 Importance: 0.0
    Variable: Tm_Baltimore Ravens20 Importance: 0.0
    Variable: Tm_Baltimore Ravens21 Importance: 0.0
    Variable: Tm_Buffalo Bills19   Importance: 0.0
    Variable: Tm_Buffalo Bills20   Importance: 0.0
    Variable: Tm_Buffalo Bills21   Importance: 0.0
    Variable: Tm_Carolina Panthers19 Importance: 0.0
    Variable: Tm_Carolina Panthers20 Importance: 0.0
    Variable: Tm_Carolina Panthers21 Importance: 0.0
    Variable: Tm_Chicago Bears19   Importance: 0.0
    Variable: Tm_Chicago Bears20   Importance: 0.0
    Variable: Tm_Chicago Bears21   Importance: 0.0
    Variable: Tm_Cincinnati Bengals19 Importance: 0.0
    Variable: Tm_Cincinnati Bengals20 Importance: 0.0
    Variable: Tm_Cincinnati Bengals21 Importance: 0.0
    Variable: Tm_Cleveland Browns19 Importance: 0.0
    Variable: Tm_Cleveland Browns20 Importance: 0.0
    Variable: Tm_Cleveland Browns21 Importance: 0.0
    Variable: Tm_Dallas Cowboys19  Importance: 0.0
    Variable: Tm_Dallas Cowboys20  Importance: 0.0
    Variable: Tm_Dallas Cowboys21  Importance: 0.0
    Variable: Tm_Denver Broncos19  Importance: 0.0
    Variable: Tm_Denver Broncos20  Importance: 0.0
    Variable: Tm_Denver Broncos21  Importance: 0.0
    Variable: Tm_Detroit Lions19   Importance: 0.0
    Variable: Tm_Detroit Lions20   Importance: 0.0
    Variable: Tm_Detroit Lions21   Importance: 0.0
    Variable: Tm_Green Bay Packers19 Importance: 0.0
    Variable: Tm_Green Bay Packers20 Importance: 0.0
    Variable: Tm_Green Bay Packers21 Importance: 0.0
    Variable: Tm_Houston Texans19  Importance: 0.0
    Variable: Tm_Houston Texans20  Importance: 0.0
    Variable: Tm_Houston Texans21  Importance: 0.0
    Variable: Tm_Indianapolis Colts19 Importance: 0.0
    Variable: Tm_Indianapolis Colts20 Importance: 0.0
    Variable: Tm_Indianapolis Colts21 Importance: 0.0
    Variable: Tm_Jacksonville Jaguars19 Importance: 0.0
    Variable: Tm_Jacksonville Jaguars20 Importance: 0.0
    Variable: Tm_Jacksonville Jaguars21 Importance: 0.0
    Variable: Tm_Kansas City Chiefs19 Importance: 0.0
    Variable: Tm_Kansas City Chiefs20 Importance: 0.0
    Variable: Tm_Kansas City Chiefs21 Importance: 0.0
    Variable: Tm_Las Vegas Raiders20 Importance: 0.0
    Variable: Tm_Las Vegas Raiders21 Importance: 0.0
    Variable: Tm_Los Angeles Chargers19 Importance: 0.0
    Variable: Tm_Los Angeles Chargers20 Importance: 0.0
    Variable: Tm_Los Angeles Chargers21 Importance: 0.0
    Variable: Tm_Los Angeles Rams19 Importance: 0.0
    Variable: Tm_Los Angeles Rams20 Importance: 0.0
    Variable: Tm_Los Angeles Rams21 Importance: 0.0
    Variable: Tm_Miami Dolphins19  Importance: 0.0
    Variable: Tm_Miami Dolphins20  Importance: 0.0
    Variable: Tm_Miami Dolphins21  Importance: 0.0
    Variable: Tm_Minnesota Vikings19 Importance: 0.0
    Variable: Tm_Minnesota Vikings20 Importance: 0.0
    Variable: Tm_Minnesota Vikings21 Importance: 0.0
    Variable: Tm_New England Patriots19 Importance: 0.0
    Variable: Tm_New England Patriots20 Importance: 0.0
    Variable: Tm_New England Patriots21 Importance: 0.0
    Variable: Tm_New Orleans Saints19 Importance: 0.0
    Variable: Tm_New Orleans Saints20 Importance: 0.0
    Variable: Tm_New Orleans Saints21 Importance: 0.0
    Variable: Tm_New York Giants19 Importance: 0.0
    Variable: Tm_New York Giants20 Importance: 0.0
    Variable: Tm_New York Giants21 Importance: 0.0
    Variable: Tm_New York Jets19   Importance: 0.0
    Variable: Tm_New York Jets20   Importance: 0.0
    Variable: Tm_New York Jets21   Importance: 0.0
    Variable: Tm_Oakland Raiders19 Importance: 0.0
    Variable: Tm_Philadelphia Eagles19 Importance: 0.0
    Variable: Tm_Philadelphia Eagles20 Importance: 0.0
    Variable: Tm_Philadelphia Eagles21 Importance: 0.0
    Variable: Tm_Pittsburgh Steelers19 Importance: 0.0
    Variable: Tm_Pittsburgh Steelers20 Importance: 0.0
    Variable: Tm_Pittsburgh Steelers21 Importance: 0.0
    Variable: Tm_San Francisco 49ers19 Importance: 0.0
    Variable: Tm_San Francisco 49ers20 Importance: 0.0
    Variable: Tm_San Francisco 49ers21 Importance: 0.0
    Variable: Tm_Seattle Seahawks19 Importance: 0.0
    Variable: Tm_Seattle Seahawks20 Importance: 0.0
    Variable: Tm_Seattle Seahawks21 Importance: 0.0
    Variable: Tm_Tampa Bay Buccaneers19 Importance: 0.0
    Variable: Tm_Tampa Bay Buccaneers20 Importance: 0.0
    Variable: Tm_Tampa Bay Buccaneers21 Importance: 0.0
    Variable: Tm_Tennessee Titans19 Importance: 0.0
    Variable: Tm_Tennessee Titans20 Importance: 0.0
    Variable: Tm_Tennessee Titans21 Importance: 0.0
    Variable: Tm_Washington Football Team20 Importance: 0.0
    Variable: Tm_Washington Football Team21 Importance: 0.0
    Variable: Tm_Washington Redskins19 Importance: 0.0
    
    xxxxxxxxxx

    Random Forest Regression of the defence of NFL teams through the 2018-21 seasons¶

    All modules and libraries imported. csv containing raw data from ProFootballReference.com is also imported and read by the pandas read function. This is defined as df.

    [19]:
    xxxxxxxxxx
     
    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    from sklearn import linear_model
    from sklearn.linear_model import LinearRegression
    from sklearn.metrics import mean_squared_error, r2_score
    import statsmodels.api as sm
    from statsmodels.formula.api import ols
    from statsmodels.stats.anova import anova_lm
    from statsmodels.graphics.factorplots import interaction_plot
    from scipy import stats
    from scipy.stats import rankdata
    import seaborn as sns
    df = pd.read_csv (r'C:\Users\Rob\Documents\2021stats.csv')
    print (df)
                            Tm  Att   Yds   1D   YBC  YBC/Att   YAC  YAC/Att  \
    0      Arizona Cardinals21  496  2076  127  1286      2.6   790      1.6   
    1        Atlanta Falcons21  393  1451   75   811      2.1   640      1.6   
    2       Baltimore Ravens21  517  2479  159  1579      3.1   900      1.7   
    3          Buffalo Bills21  461  2209  134  1208      2.6  1001      2.2   
    4      Carolina Panthers21  455  1842  117  1036      2.3   806      1.8   
    ..                     ...  ...   ...  ...   ...      ...   ...      ...   
    91   San Francisco 49ers19  498  2305  110  1472      3.0   833      1.7   
    92      Seattle Seahawks19  481  2200  121  1125      2.3  1075      2.2   
    93  Tampa Bay Buccaneers19  409  1521   81   746      1.8   775      1.9   
    94      Tennessee Titans19  445  2223  104   940      2.1  1283      2.9   
    95   Washington Redskins19  356  1583   74   680      1.9   903      2.5   
    
        BrkTkl  Att/Br  ...  Ydsp   IAY  IAY/PA   CAY  CAY/Cmp  CAY/PA  YACp  \
    0       28    17.7  ...  4276  4459     7.5  2340      5.6     4.0  2279   
    1       19    20.7  ...  3713  4127     7.2  2252      6.0     3.9  1735   
    2       31    16.7  ...  3961  5239     8.6  2552      6.4     4.2  1715   
    3       40    11.5  ...  4284  5364     8.2  2690      6.5     4.1  1760   
    4       31    14.7  ...  3239  4434     7.4  1751      5.0     2.9  1822   
    ..     ...     ...  ...   ...   ...     ...   ...      ...     ...   ...   
    91      29    17.2  ...  3792  3124     6.5  1837      5.5     3.8  2192   
    92      38    12.7  ...  3791  4869     9.4  2402      7.0     4.6  1708   
    93      35    11.7  ...  4845  6498    10.3  3254      8.5     5.2  1873   
    94      40    11.1  ...  3582  3869     8.6  2103      7.1     4.7  1853   
    95      15    23.7  ...  2812  3639     7.6  1751      5.9     3.7  1454   
    
        YAC/Cmp   PF    PFA  
    0       5.5  449  384.1  
    1       4.6  313  384.1  
    2       4.3  387  384.1  
    3       4.2  483  384.1  
    4       5.2  304  384.1  
    ..      ...  ...    ...  
    91      6.6  479  384.1  
    92      5.0  405  384.1  
    93      4.9  458  384.1  
    94      6.2  402  384.1  
    95      4.9  266  384.1  
    
    [96 rows x 22 columns]
    
    xxxxxxxxxx
     
    ### Bar Chart
    ​
    Display the teams and a variable 

    Bar Chart¶

    Display the teams and a variable

    [5]:
    xxxxxxxxxx
     
    plt.figure(figsize=(17,5))
    plt.bar(df['Tm'], df['Ydsp'])
    plt.xticks(rotation = 90)
    plt.ylim(2500, 5250)
    plt.show()
    [6]:
     
    plt.scatter(df['PF'], df['1D'])
    [6]:
    <matplotlib.collections.PathCollection at 0x175720b5f40>
    xxxxxxxxxx
     
    Define the columns that will be used in the models or tests

    Define the columns that will be used in the models or tests

    [7]:
    xxxxxxxxxx
     
    pf = df['PF']
    pf = np.array(pf).reshape((-1,1))
    d1 = df['1D']
    xxxxxxxxxx
     
    ### Linear Regression
    ​
    Perform a Linear Regression 

    Linear Regression¶

    Perform a Linear Regression

    [8]:
     
    model = LinearRegression()
    model = LinearRegression().fit(pf, d1)
    ​
    r_sq = model.score(pf, d1)
    print(f"coefficient of determination: {r_sq}")
    print(f"intercept: {model.intercept_}")
    print(f"slope: {model.coef_}")
    coefficient of determination: 0.26945079972915786
    intercept: 41.16998558184153
    slope: [0.17247061]
    
    [9]:
    xxxxxxxxxx
     
    d1_pred = model.intercept_ + model.coef_ * pf
    print(f"predicted response:\n{d1_pred}")
    predicted response:
    [[118.60929082]
     [ 95.15328745]
     [107.91611282]
     [124.47329167]
     [ 93.60105194]
     [ 94.80834623]
     [120.50646757]
     [101.36222952]
     [132.57941048]
     [ 98.94764094]
     [ 97.22293481]
     [118.78176144]
     [ 89.46175722]
     [118.95423205]
     [ 84.80505067]
     [123.95587983]
     [105.67399485]
     [122.92105615]
     [120.50646757]
     [ 99.98246462]
     [114.46999611]
     [120.85140879]
     [103.94928872]
     [ 85.66740374]
     [ 94.63587562]
     [117.74693776]
     [100.32740584]
     [114.81493734]
     [109.29587772]
     [129.30246883]
     [113.43517243]
     [ 98.94764094]
     [111.88293692]
     [109.46834833]
     [121.88623247]
     [127.5777627 ]
     [101.53470014]
     [105.32905362]
     [ 94.80834623]
     [111.53799569]
     [109.29587772]
     [ 96.87799358]
     [106.19140669]
     [128.9575276 ]
     [107.39870098]
     [118.95423205]
     [ 93.94599316]
     [122.74858554]
     [116.02223163]
     [107.39870098]
     [105.32905362]
     [110.84811324]
     [115.33234918]
     [ 97.39540542]
     [124.30082105]
     [ 89.46175722]
     [ 83.08034454]
     [ 98.77517033]
     [112.91776059]
     [106.01893607]
     [120.33399695]
     [126.02552718]
     [125.85305657]
     [ 98.94764094]
     [103.43187688]
     [106.88128914]
     [132.75188109]
     [ 95.32575807]
     [ 99.80999401]
     [ 89.46175722]
     [ 89.28928661]
     [ 98.94764094]
     [116.02223163]
     [ 89.80669845]
     [ 99.98246462]
     [106.01893607]
     [106.3638773 ]
     [103.43187688]
     [ 92.91116948]
     [118.95423205]
     [ 99.29258217]
     [109.12340711]
     [ 93.94599316]
     [111.36552508]
     [113.60764305]
     [120.16152634]
     [ 99.98246462]
     [ 88.77187477]
     [ 95.15328745]
     [107.57117159]
     [ 91.01399274]
     [123.78340921]
     [111.02058385]
     [120.16152634]
     [110.50317201]
     [ 87.04716864]]
    
    xxxxxxxxxx
     
    ### Redefining
    ​
    Redefine the variables so that x has two lists that were the most important from the Random Forest Regression

    Redefining¶

    Redefine the variables so that x has two lists that were the most important from the Random Forest Regression

    [10]:
    xxxxxxxxxx
     
    x = df['1D'], df['Ydsp']
    y = df['PF']
    x, y = np.array(x), np.array(y)
    [11]:
    xxxxxxxxxx
     
    x = np.vstack((df['1D'], df['Ydsp'])).T
    [12]:
     
    x
    [12]:
    array([[ 127, 4276],
           [  75, 3713],
           [ 159, 3961],
           [ 134, 4284],
           [ 117, 3239],
           [ 119, 3207],
           [ 101, 4403],
           [ 138, 3320],
           [ 111, 4800],
           [ 123, 3593],
           [ 104, 3598],
           [ 109, 4315],
           [  77, 3305],
           [ 154, 3361],
           [  92, 3436],
           [ 119, 4791],
           [  95, 4567],
           [ 112, 4800],
           [ 101, 4642],
           [  87, 3651],
           [ 103, 4238],
           [ 139, 3857],
           [ 113, 3186],
           [  90, 3196],
           [  87, 3541],
           [ 163, 3404],
           [  85, 3778],
           [ 130, 4221],
           [ 106, 3432],
           [ 106, 5229],
           [ 134, 3418],
           [ 129, 3441],
           [ 136, 3916],
           [  86, 4363],
           [ 165, 2739],
           [ 119, 4620],
           [ 110, 3888],
           [  93, 3655],
           [  92, 3448],
           [ 133, 3539],
           [ 115, 4161],
           [  94, 3451],
           [  93, 4104],
           [ 114, 4106],
           [  83, 4538],
           [ 129, 4053],
           [  80, 3699],
           [ 110, 4854],
           [ 121, 4217],
           [ 111, 4329],
           [ 123, 4014],
           [ 100, 3736],
           [ 139, 4009],
           [ 143, 2890],
           [ 147, 3758],
           [  91, 3026],
           [  94, 2796],
           [ 114, 3327],
           [  81, 4003],
           [ 101, 4033],
           [ 111, 3941],
           [  82, 4626],
           [ 142, 3653],
           [ 108, 3465],
           [ 109, 3477],
           [  84, 4714],
           [ 188, 3225],
           [ 120, 3229],
           [  82, 3650],
           [  85, 3291],
           [  85, 3652],
           [  90, 3554],
           [ 120, 4751],
           [  76, 3115],
           [  82, 3900],
           [  90, 3733],
           [ 112, 3783],
           [ 131, 3108],
           [  84, 3760],
           [  93, 4498],
           [  90, 4426],
           [  92, 4499],
           [  64, 3804],
           [ 106, 3523],
           [ 110, 3961],
           [  97, 4244],
           [  89, 3731],
           [  61, 3111],
           [ 104, 3926],
           [ 104, 3833],
           [  75, 2981],
           [ 110, 3792],
           [ 121, 3791],
           [  81, 4845],
           [ 104, 3582],
           [  74, 2812]], dtype=int64)
    xxxxxxxxxx
     
    ### Linear Regression
    ​
    Use a linear regression for the x and y variables so we can now see the correlation coefficients of the combined 

    Linear Regression¶

    Use a linear regression for the x and y variables so we can now see the correlation coefficients of the combined

    [13]:
    xxxxxxxxxx
     
    model = LinearRegression().fit(x, y)
    ​
    r_sq = model.score(x, y)
    print(f"coefficient of determination: {r_sq}")
    ​
    print(f"intercept: {model.intercept_}")
    ​
    print(f"coefficients: {model.coef_}")
    coefficient of determination: 0.7068457568512116
    intercept: -135.88423483853654
    coefficients: [1.73092809 0.08727336]
    
    xxxxxxxxxx
     
    Use the model to predict y based on the x variables already used

    Use the model to predict y based on the x variables already used

    [14]:
    xxxxxxxxxx
     
    y_pred = model.predict(x)
    print(f"predicted response:\n{y_pred}")
    predicted response:
    [457.12451928 317.98135694 485.0231099  469.93920281 349.31276427
     349.98187294 423.20410558 392.73139635 475.16091034 390.59310219
     358.14183523 429.37147466 285.83568233 424.00445357 323.23241384
     488.22287484 427.13136803 476.89183843 444.06243857 333.34154574
     412.2658574  441.32811864 337.76356383 298.82495131 323.74147616
     443.33556087 340.96340625 457.51726877 347.11631369 503.94654122
     394.36047324 387.71312005 441.28446259 393.74924981 388.7606328
     473.29913031 393.83667812 344.07620773 324.27969416 403.18962168
     426.3169458  328.00337043 383.26194627 419.78598292 403.8293035
     441.12441624 325.41417036 478.14274367 441.58982251 434.05515788
     427.33518665 363.26184652 454.59366933 363.8584921  446.53548076
     285.71940824 270.83931977 351.80003565 353.67619983 390.91296246
     400.19309428 409.77843106 428.71713752 353.45819075 356.23639916
     420.92034291 470.98683177 353.63281495 324.59963192 298.46128003
     329.96696291 330.06881411 486.46286854 267.52281588 346.41797186
     345.69074552 388.13483153 362.11294743 337.66155768 417.64765003
     406.17118384 416.00399529 306.88302367 355.05818944 400.20763339
     402.40392901 343.7852707  241.20980106 386.76749724 378.65107478
     254.09725758 385.45843558 404.41137124 427.16036876 356.74546148
     237.61713168]
    
    [15]:
    xxxxxxxxxx
     
    y_pred = model.intercept_ + np.sum(model.coef_ * x, axis=1)
    print(f"predicted response:\n{y_pred}")
    predicted response:
    [457.12451928 317.98135694 485.0231099  469.93920281 349.31276427
     349.98187294 423.20410558 392.73139635 475.16091034 390.59310219
     358.14183523 429.37147466 285.83568233 424.00445357 323.23241384
     488.22287484 427.13136803 476.89183843 444.06243857 333.34154574
     412.2658574  441.32811864 337.76356383 298.82495131 323.74147616
     443.33556087 340.96340625 457.51726877 347.11631369 503.94654122
     394.36047324 387.71312005 441.28446259 393.74924981 388.7606328
     473.29913031 393.83667812 344.07620773 324.27969416 403.18962168
     426.3169458  328.00337043 383.26194627 419.78598292 403.8293035
     441.12441624 325.41417036 478.14274367 441.58982251 434.05515788
     427.33518665 363.26184652 454.59366933 363.8584921  446.53548076
     285.71940824 270.83931977 351.80003565 353.67619983 390.91296246
     400.19309428 409.77843106 428.71713752 353.45819075 356.23639916
     420.92034291 470.98683177 353.63281495 324.59963192 298.46128003
     329.96696291 330.06881411 486.46286854 267.52281588 346.41797186
     345.69074552 388.13483153 362.11294743 337.66155768 417.64765003
     406.17118384 416.00399529 306.88302367 355.05818944 400.20763339
     402.40392901 343.7852707  241.20980106 386.76749724 378.65107478
     254.09725758 385.45843558 404.41137124 427.16036876 356.74546148
     237.61713168]
    
    x
     
    Input all the data from the current season for the variables used to make the model above. None f these variables were averages like yards/play etc therefore, the variables had to be converted to a full season projection. this was done by dividing the data by games played by that team and multiplied by 17(the number of games in a season)

    Input all the data from the current season for the variables used to make the model above. None f these variables were averages like yards/play etc therefore, the variables had to be converted to a full season projection. this was done by dividing the data by games played by that team and multiplied by 17(the number of games in a season)

    [16]:
    xxxxxxxxxx
     
    x_new = [153,2579],[97,5451],[82,2885],[133,2088],[90,4738],[163,3645],[85,4429],[131,3761],[106,3115],[87,3725],[96,3635],[106,3761],[99,3558],[119,3975],[92,5032],[80,4786],[96,4114],[128,4356],[121,3796],[167,2734],[102,4369],[96,2813],[153,3475],[82,3582],[130,4057],[68,4068],[157,3225],[87,4750],[114,3837],[116,3982],[72,4524],[104,3757]
    x_new
    [16]:
    ([153, 2579],
     [97, 5451],
     [82, 2885],
     [133, 2088],
     [90, 4738],
     [163, 3645],
     [85, 4429],
     [131, 3761],
     [106, 3115],
     [87, 3725],
     [96, 3635],
     [106, 3761],
     [99, 3558],
     [119, 3975],
     [92, 5032],
     [80, 4786],
     [96, 4114],
     [128, 4356],
     [121, 3796],
     [167, 2734],
     [102, 4369],
     [96, 2813],
     [153, 3475],
     [82, 3582],
     [130, 4057],
     [68, 4068],
     [157, 3225],
     [87, 4750],
     [114, 3837],
     [116, 3982],
     [72, 4524],
     [104, 3757])
    x
     
    ### Predicting current season
    ​
    Use the data from the current season to predict every teams points scored over the full season 

    Predicting current season¶

    Use the data from the current season to predict every teams points scored over the full season

    [17]:
    xxxxxxxxxx
     
    y_new = model.predict(x_new)
    y_new
    [17]:
    array([354.02575813, 507.74287427, 257.83551168, 276.55597663,
           433.4004721 , 464.36844058, 397.77836346, 419.10245137,
           319.45065864, 339.79977436, 347.52352481, 375.82924906,
           345.99626038, 417.00781325, 462.52069606, 420.28031245,
           389.32746415, 465.83731616, 404.84773804, 391.78612218,
           421.96773944, 275.78482307, 432.22268849, 318.66504345,
           443.20443777, 336.84690302, 417.32806092, 429.25496814,
           396.30944914, 412.4259425 , 383.56726745, 372.01829944])
    xxxxxxxxxx
     
    Print the predictions for each team which is in alphabetical order

    Print the predictions for each team which is in alphabetical order

    [18]:
    xxxxxxxxxx
     
    ylist = list(y_new)
    print(type(ylist))
    print(ylist)
    print(len(ylist))
    <class 'list'>
    [354.025758128841, 507.7428742662682, 257.8355116840193, 276.55597663408105, 433.4004720970019, 464.3684405774597, 397.7783634638899, 419.1024513654133, 319.4506586439897, 339.79977436144287, 347.5235248098312, 375.82924906324644, 345.9962603828671, 417.00781325374936, 462.5206960571218, 420.28031244567745, 389.327464145472, 465.8373161595213, 404.84773803692116, 391.7861221842847, 421.9677394424355, 275.7848230689193, 432.2226884936305, 318.66504345216475, 443.2044377688374, 336.8469030170672, 417.3280609164444, 429.2549681381275, 396.30944914338187, 412.4259424959642, 383.5672674460657, 372.0182994399446]
    32
    
    xxxxxxxxxx
     
    Create a list for all the teams in the NFL in alphabetical order

    Create a list for all the teams in the NFL in alphabetical order

    [159]:
     
    Teams = ('Atlanta Falcons', 'Buffalo Bills', 'Carolina Panthers', 'Chicago Bears', 'Cincinnati Bengals', 'Cleveland Browns', 'Indianapolis Colts', 'Arizona Cardinals', 'Dallas Cowboys', 'Denver Broncos', 'Detroit Lions', 'Green Bay Packers', 'Houston Texans', 'Jacksonville Jaguars', 'Kansas City Chiefs', 'Miami Dolphins', 'Minnesota Vikings', 'New Orleans Saints', 'New England Patriots', 'New York Giants', 'New York Jets', 'Tennessee Titans', 'Philadelphia Eagles', 'Pittsburgh Steelers', 'Las Vegas Raiders', 'Los Angeles Rams', 'Baltimore Ravens', 'Los Angeles Chargers', 'Seattle Seahawks', 'San Francisco 49ers', 'Tampa Bay Buccaneers', 'Washington Commanders')
    print(len(Teams))
    32
    
    xxxxxxxxxx
     
    Create a data frame for the predictions created

    Create a data frame for the predictions created

    [21]:
    x
     
    data = ylist
    df1 = pd.DataFrame(data, columns=['Pred'])
      
    df1
    [161]:
    xxxxxxxxxx
     
    df1['Teams'] = Teams
    print(df1)
    xxxxxxxxxx
     
    Save the data frame 

    Save the data frame

    [163]:
    xxxxxxxxxx
     
    df1.to_csv(r'C:\Users\Rob\Documents\Prediction.csv', index = False)
    ---------------------------------------------------------------------------
    PermissionError                           Traceback (most recent call last)
    ~\AppData\Local\Temp/ipykernel_6928/2031236065.py in <module>
    ----> 1 df1.to_csv(r'C:\Users\Rob\Documents\Prediction.csv', index = False)
    
    ~\mambaforge\lib\site-packages\pandas\core\generic.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, date_format, doublequote, escapechar, decimal, errors, storage_options)
       3549         )
       3550 
    -> 3551         return DataFrameRenderer(formatter).to_csv(
       3552             path_or_buf,
       3553             line_terminator=line_terminator,
    
    ~\mambaforge\lib\site-packages\pandas\io\formats\format.py in to_csv(self, path_or_buf, encoding, sep, columns, index_label, mode, compression, quoting, quotechar, line_terminator, chunksize, date_format, doublequote, escapechar, errors, storage_options)
       1178             formatter=self.fmt,
       1179         )
    -> 1180         csv_formatter.save()
       1181 
       1182         if created_buffer:
    
    ~\mambaforge\lib\site-packages\pandas\io\formats\csvs.py in save(self)
        239         """
        240         # apply compression and byte/text conversion
    --> 241         with get_handle(
        242             self.filepath_or_buffer,
        243             self.mode,
    
    ~\mambaforge\lib\site-packages\pandas\io\common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
        787         if ioargs.encoding and "b" not in ioargs.mode:
        788             # Encoding
    --> 789             handle = open(
        790                 handle,
        791                 ioargs.mode,
    
    PermissionError: [Errno 13] Permission denied: 'C:\\Users\\Rob\\Documents\\Prediction.csv'
    [ ]:
     
    ​
    ​
    x
     
    # Random Forest Regression of the defence of NFL teams through the 2018-21 seasons 
    ​
    Import all necessary libraries, modules and the dataframe chosen

    Random Forest Regression of the defence of NFL teams through the 2018-21 seasons¶

    Import all necessary libraries, modules and the dataframe chosen

    [1]:
    xxxxxxxxxx
     
    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    from sklearn import linear_model
    from sklearn.linear_model import LinearRegression
    from sklearn.metrics import mean_squared_error, r2_score
    import statsmodels.api as sm
    from statsmodels.formula.api import ols
    from statsmodels.stats.anova import anova_lm
    from statsmodels.graphics.factorplots import interaction_plot
    from scipy import stats
    import matplotlib.colors as mcolors
    from scipy.stats import rankdata
    import seaborn as sns
    df = pd.read_csv (r'C:\Users\Rob\Documents\dstats.csv')
    print (df)
    x
     
    Define the 3 most important variables from the Random Forest Regression that are not dicectly related and the y variable which is what was trained to predict. Make the 3 variables into an array.

    Define the 3 most important variables from the Random Forest Regression that are not dicectly related and the y variable which is what was trained to predict. Make the 3 variables into an array.

    [31]:
    xxxxxxxxxx
     
    x = df['ANY/A'], df['Rate'], df['Ydsr']
    y = df['PA']
    x, y = np.array(x), np.array(y)
    ​
    x = np.vstack((df['ANY/A'], df['Rate'],df['Ydsr'])).T
    print(x)
    xxxxxxxxxx
     
    ### Linear Regression
    ​
    Conduct a Linear Regression on the variables deifned above

    Linear Regression¶

    Conduct a Linear Regression on the variables deifned above

    [12]:
    xxxxxxxxxx
     
    model = LinearRegression().fit(x, y)
    ​
    r_sq = model.score(x, y)
    print(f"coefficient of determination: {r_sq}")
    ​
    print(f"intercept: {model.intercept_}")
    ​
    print(f"coefficients: {model.coef_}")
    coefficient of determination: 0.7507669629553981
    intercept: -72.00298605592161
    coefficients: [29.23877774  1.76379634  0.05969919]
    
    xxxxxxxxxx
     
    Prepare the model to predict the points allowed for the current season. 

    Prepare the model to predict the points allowed for the current season.

    [33]:
    xxxxxxxxxx
     
    y_pred = model.predict(x)
    print(f"predicted response:\n{y_pred}")
    ​
    y_pred = model.intercept_ + np.sum(model.coef_ * x, axis=1)
    print(f"predicted response:\n{y_pred}")
    xxxxxxxxxx
     
    ### Current season data
    ​
    Input all the data for the variables chosen for the current season. This data was taken from week 7 which is problematic as some teams had their BYE's. therefore the data had to be manually taken from ProFootballReference. Data that is already averaged or not cumulative such as Qb Rate and ANY/A did not suffer this issue, but Ydsr did. as a result the projected number was used by dividing the current number by games played and then multiplied by 17 (the number of games in the regular season).

    Current season data¶

    Input all the data for the variables chosen for the current season. This data was taken from week 7 which is problematic as some teams had their BYE's. therefore the data had to be manually taken from ProFootballReference. Data that is already averaged or not cumulative such as Qb Rate and ANY/A did not suffer this issue, but Ydsr did. as a result the projected number was used by dividing the current number by games played and then multiplied by 17 (the number of games in the regular season).

    [20]:
    xxxxxxxxxx
     
    x_new = [6.9, 97.2, 1719],[7.2, 100.2, 1705],[5.9, 90.1, 1787],[3.9, 71.2, 1294],[6.1, 87.7, 2054],[4.9, 75.5, 2545],[5.2, 75.2, 2023],[7, 92.7, 2304],[4.1, 77.7, 2042],[4, 74.9, 2238],[7.8, 102.4, 2372],[6.1, 95, 2372],[5.9, 78.8, 2799],[6.1, 94.9, 2098],[5.8, 84.1, 1874],[6.9, 103.8, 1564],[7.5, 104.7, 1768],[5.9, 93, 2338],[5.6, 86.5, 1517],[6.9, 98.9, 1755],[6.8, 93.7, 1912],[5.3, 81.5, 2324],[7.1, 98.4, 2093],[6.5, 85.5, 2455],[4.9, 77.3, 1789],[3.5, 66, 1867],[6.6, 87.6, 2020],[5.3, 86.6, 1547],[6.8, 95.9, 2545],[5.1, 86.4, 2010],[6.9, 100.4, 1646],[7, 99.1, 2010]
    print(x_new)
    ([6.9, 97.2, 1719], [7.2, 100.2, 1705], [5.9, 90.1, 1787], [3.9, 71.2, 1294], [6.1, 87.7, 2054], [4.9, 75.5, 2545], [5.2, 75.2, 2023], [7, 92.7, 2304], [4.1, 77.7, 2042], [4, 74.9, 2238], [7.8, 102.4, 2372], [6.1, 95, 2372], [5.9, 78.8, 2799], [6.1, 94.9, 2098], [5.8, 84.1, 1874], [6.9, 103.8, 1564], [7.5, 104.7, 1768], [5.9, 93, 2338], [5.6, 86.5, 1517], [6.9, 98.9, 1755], [6.8, 93.7, 1912], [5.3, 81.5, 2324], [7.1, 98.4, 2093], [6.5, 85.5, 2455], [4.9, 77.3, 1789], [3.5, 66, 1867], [6.6, 87.6, 2020], [5.3, 86.6, 1547], [6.8, 95.9, 2545], [5.1, 86.4, 2010], [6.9, 100.4, 1646], [7, 99.1, 2010])
    
    xxxxxxxxxx
     
    ### Model the current season data
    ​
    Now use the data for the current season for the model to predict the amount of points a team will allow over the season.

    Model the current season data¶

    Now use the data for the current season for the model to predict the amount of points a team will allow over the season.

    [34]:
    xxxxxxxxxx
     
    y_new = model.predict(x_new)
    y_new
    [34]:
    array([403.80849401, 417.03572765, 366.10630729, 244.86129976,
           383.66063572, 356.36809008, 333.44760663, 433.71931519,
           306.82872659, 310.66726058, 478.27870685, 415.52069181,
           406.59099032, 398.98673375, 357.79348115, 406.19617516,
           437.50549354, 404.11557108, 334.86622551, 408.95611866,
           406.23327298, 365.45285791, 434.10030271, 415.4151706 ,
           314.41033485, 258.20168434, 396.07387245, 328.06194756,
           447.90321302, 349.50215832, 405.09460131, 427.45604949])
    xxxxxxxxxx
     
    Now define the predictions and display the dtype and length of the list which must be 32 (the number of teams in the NFL)

    Now define the predictions and display the dtype and length of the list which must be 32 (the number of teams in the NFL)

    [35]:
    xxxxxxxxxx
     
    ylist = list(y_new)
    print(type(ylist))
    print(ylist)
    print(len(ylist))
    <class 'list'>
    [403.80849400500233, 417.03572765423337, 366.1063072939253, 244.86129975785502, 383.66063572031277, 356.3680900799129, 333.447606633793, 433.71931519290433, 306.82872659012247, 310.6672605762693, 478.2787068459364, 415.5206918052479, 406.5909903224467, 398.98673375048844, 357.7934811542876, 406.1961751617915, 437.50549353751956, 404.11557108196075, 334.86622550923886, 408.9561186620054, 406.2332729826779, 365.4528579054986, 434.10030270854656, 415.4151705987924, 314.4103348462005, 258.20168434438426, 396.07387245301845, 328.0619475596736, 447.90321302431096, 349.5021583247003, 405.09460131082767, 427.45604948942344]
    32
    
    xxxxxxxxxx
     
    Now create a new list of all the teams in alphabetical order 

    Now create a new list of all the teams in alphabetical order

    [36]:
    xxxxxxxxxx
     
    Teams = ('Arizona Cardinals','Atlanta Falcons','Baltimore Ravens','Buffalo Bills','Carolina Panthers','Chicago Bears','Cincinnati Bengals','Cleveland Browns','Dallas Cowboys','Denver Broncos','Detroit Lions','Green Bay Packers','Houston Texans','Indianapolis Colts','Jacksonville Jaguars','Kansas City Chiefs','Las Vegas Raiders','Los Angeles Chargers','Los Angeles Rams','Miami Dolphins','Minnesota Vikings','New England Patriots','New Orleans Saints','New York Giants','New York Jets','Philadelphia Eagles','Pittsburgh Steelers','San Francisco 49ers','Seattle Seahawks','Tampa Bay Buccaneers','Tennessee Titans','Washington Commanders')
    print(len(Teams))
    32
    
    xxxxxxxxxx
     
    ### New dataframe
    ​
    Create a new dataframe consisting of the teams and their corresponding points predicted by the model for the current season.

    New dataframe¶

    Create a new dataframe consisting of the teams and their corresponding points predicted by the model for the current season.

    [38]:
    xxxxxxxxxx
     
    data = ylist
      
    df1 = pd.DataFrame(data, columns=['Pred'])
    ​
    df1['Teams'] = Teams
    print(df1)
              Pred                  Teams
    0   403.808494      Arizona Cardinals
    1   417.035728        Atlanta Falcons
    2   366.106307       Baltimore Ravens
    3   244.861300          Buffalo Bills
    4   383.660636      Carolina Panthers
    5   356.368090          Chicago Bears
    6   333.447607     Cincinnati Bengals
    7   433.719315       Cleveland Browns
    8   306.828727         Dallas Cowboys
    9   310.667261         Denver Broncos
    10  478.278707          Detroit Lions
    11  415.520692      Green Bay Packers
    12  406.590990         Houston Texans
    13  398.986734     Indianapolis Colts
    14  357.793481   Jacksonville Jaguars
    15  406.196175     Kansas City Chiefs
    16  437.505494      Las Vegas Raiders
    17  404.115571   Los Angeles Chargers
    18  334.866226       Los Angeles Rams
    19  408.956119         Miami Dolphins
    20  406.233273      Minnesota Vikings
    21  365.452858   New England Patriots
    22  434.100303     New Orleans Saints
    23  415.415171        New York Giants
    24  314.410335          New York Jets
    25  258.201684    Philadelphia Eagles
    26  396.073872    Pittsburgh Steelers
    27  328.061948    San Francisco 49ers
    28  447.903213       Seattle Seahawks
    29  349.502158   Tampa Bay Buccaneers
    30  405.094601       Tennessee Titans
    31  427.456049  Washington Commanders
    
    xxxxxxxxxx
     
    ### Save
    ​
    Save the data frame

    Save¶

    Save the data frame

    [30]:
    xxxxxxxxxx
     
    df1.to_csv(r'C:\Users\Rob\Documents\DPrediction.csv', index = False)
    [ ]:
     
    ​
    x
     
    # Random Forest Regression of the defence of NFL teams through the 2018-21 seasons 
    ​
    Import all necessary libraries, modules. The dataframe chosen to be used includes the predictions made, the points allowed, points allowed per game(from ProFootballReference) and the point projection which is the points allowed divided by games played and multiplied by 17.

    Random Forest Regression of the defence of NFL teams through the 2018-21 seasons¶

    Import all necessary libraries, modules. The dataframe chosen to be used includes the predictions made, the points allowed, points allowed per game(from ProFootballReference) and the point projection which is the points allowed divided by games played and multiplied by 17.

    [ ]:
     
    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    from sklearn import linear_model
    from sklearn.linear_model import LinearRegression
    from sklearn.metrics import mean_squared_error, r2_score
    import statsmodels.api as sm
    from statsmodels.formula.api import ols
    from statsmodels.stats.anova import anova_lm
    from statsmodels.graphics.factorplots import interaction_plot
    from scipy import stats
    import matplotlib.colors as mcolors
    from scipy.stats import rankdata
    import seaborn as sns
    df = pd.read_csv (r'C:\Users\Rob\Documents\DPrediction.csv')
    print (df)
    xxxxxxxxxx
     
    Make a new attribute by dividing the predictions by 17 to get the value of the points prediction allowed per game.

    Make a new attribute by dividing the predictions by 17 to get the value of the points prediction allowed per game.

    [94]:
     
    preg = df['Pred']/17
    df['Pred/G'] = preg
    df
    [94]:
    Teams G Pred PA Pts/G Point Projection Pred/G
    0 Arizona Cardinals 7 403.81 176 25.1 427.428571 23.753529
    1 Atlanta Falcons 7 417.04 171 24.4 415.285714 24.531765
    2 Baltimore Ravens 7 366.11 161 23.0 391.000000 21.535882
    3 Buffalo Bills 6 244.86 81 13.5 229.500000 14.403529
    4 Carolina Panthers 7 383.66 149 21.3 361.857143 22.568235
    5 Chicago Bears 7 356.37 132 18.9 320.571429 20.962941
    6 Cincinnati Bengals 7 333.45 132 18.9 320.571429 19.614706
    7 Cleveland Browns 7 433.72 186 26.6 451.714286 25.512941
    8 Dallas Cowboys 7 306.83 104 14.9 252.571429 18.048824
    9 Denver Broncos 7 310.67 115 16.4 279.285714 18.274706
    10 Detroit Lions 6 478.28 194 32.3 549.666667 28.134118
    11 Green Bay Packers 7 415.52 146 20.9 354.571429 24.442353
    12 Houston Texans 6 406.59 137 22.8 388.166667 23.917059
    13 Indianapolis Colts 7 398.99 140 20.0 340.000000 23.470000
    14 Jacksonville Jaguars 7 357.79 137 19.6 332.714286 21.046471
    15 Kansas City Chiefs 7 406.20 172 24.6 417.714286 23.894118
    16 Las Vegas Raiders 6 437.51 150 25.0 425.000000 25.735882
    17 Los Angeles Chargers 7 404.12 189 27.0 459.000000 23.771765
    18 Los Angeles Rams 6 334.87 126 21.0 357.000000 19.698235
    19 Miami Dolphins 7 408.96 165 23.6 400.714286 24.056471
    20 Minnesota Vikings 6 406.23 118 19.7 334.333333 23.895882
    21 New England Patriots 7 365.45 146 20.9 354.571429 21.497059
    22 New Orleans Saints 7 434.10 200 28.6 485.714286 25.535294
    23 New York Giants 7 415.42 130 18.6 315.714286 24.436471
    24 New York Jets 7 314.41 137 19.6 332.714286 18.494706
    25 Philadelphia Eagles 6 258.20 105 17.5 297.500000 15.188235
    26 Pittsburgh Steelers 7 396.07 162 23.1 393.428571 23.298235
    27 San Francisco 49ers 7 328.06 133 19.0 323.000000 19.297647
    28 Seattle Seahawks 7 447.90 186 26.6 451.714286 26.347059
    29 Tampa Bay Buccaneers 7 349.50 124 17.7 301.142857 20.558824
    30 Tennessee Titans 6 405.09 128 21.3 362.666667 23.828824
    31 Washington Commanders 7 427.46 156 22.3 378.857143 25.144706
    xxxxxxxxxx
     
    ### Scatter Graph
    ​
    Scatter the projected points allowed and the prediction for points allowed.

    Scatter Graph¶

    Scatter the projected points allowed and the prediction for points allowed.

    [95]:
     
    plt.scatter(df['Point Projection'], df['Pred'], c = df['Pred'], cmap = 'Reds')
    plt.xlabel('Point Projection')
    plt.ylabel('My Prediction')
    plt.title('Point Prediction vs My Prediction')
    plt.xlim(220,560)
    plt.ylim(220,560)
    plt.show()
    xxxxxxxxxx
     
    ### Defining Variables
    ​
    Define the variables and reshape them to allow a linear regression to be conducted easier.

    Defining Variables¶

    Define the variables and reshape them to allow a linear regression to be conducted easier.

    [96]:
     
    pp = df['Point Projection']
    pp = np.array(pp).reshape((-1,1))
    pre = df['Pred']
    preg = df['Pred/G']
    preg = np.array(preg).reshape(-1,1)
    pg = df['Pts/G']
    xxxxxxxxxx
     
    ### Linear Regression
    ​
    Conduct the linear regression for the predicted and projected points allowed per game

    Linear Regression¶

    Conduct the linear regression for the predicted and projected points allowed per game

    [97]:
     
    model = LinearRegression()
    model = LinearRegression().fit(preg, pg)
    ​
    r_sq = model.score(preg, pg)
    print(f"coefficient of determination: {r_sq}")
    print(f"intercept: {model.intercept_}")
    print(f"slope: {model.coef_}")
    coefficient of determination: 0.6817815162467973
    intercept: -1.6450878163398777
    slope: [1.04538607]
    
    [104]:
     
    yr_proj = df['Point Projection']
    preg = df['Pred']/17
    x
    ### Projection and Prediction Comparison
    ​
    First display the difference between the prediction and the projected points allowed. In this model it is appropriate to do this for both full year and per game data to test that they share the same correlation coefficient. The per game data is more valuable as it makes it easier to identify how diferent the predictions and projections are as points in the NFL are often 3 or 7. the numbers 2 ,6 and 8 can also occur but are less common and situational which cant be controlled for by this model.
    The sum of all and squared differences should be provided also. 
    [107]:
     
    print("difference:", pre - yr_proj)
    print("Sum of All Differences:", np.sum(np.abs(pre - yr_proj)))
    print("Sum of Squared Differences:", np.sum(np.square(pre - yr_proj)))
    print("correlation:", np.corrcoef(np.array((pre, yr_proj)))[0, 1])
    difference: 0    -23.618571
    1      1.754286
    2    -24.890000
    3     15.360000
    4     21.802857
    5     35.798571
    6     12.878571
    7    -17.994286
    8     54.258571
    9     31.384286
    10   -71.386667
    11    60.948571
    12    18.423333
    13    58.990000
    14    25.075714
    15   -11.514286
    16    12.510000
    17   -54.880000
    18   -22.130000
    19     8.245714
    20    71.896667
    21    10.878571
    22   -51.614286
    23    99.705714
    24   -18.304286
    25   -39.300000
    26     2.641429
    27     5.060000
    28    -3.814286
    29    48.357143
    30    42.423333
    31    48.602857
    dtype: float64
    Sum of All Differences: 1026.4428569
    Sum of Squared Differences: 51029.10812112315
    correlation: 0.8250897929685962
    
    [108]:
     
    print("difference:", preg - pg)
    print("Sum of All Differences:", np.sum(np.abs(preg - pg)))
    print("Sum of Squared Differences:", np.sum(np.square(preg - pg)))
    print("correlation:", np.corrcoef(np.array((preg, pg)))[0, 1])
    difference: 0    -1.346471
    1     0.131765
    2    -1.464118
    3     0.903529
    4     1.268235
    5     2.062941
    6     0.714706
    7    -1.087059
    8     3.148824
    9     1.874706
    10   -4.165882
    11    3.542353
    12    1.117059
    13    3.470000
    14    1.446471
    15   -0.705882
    16    0.735882
    17   -3.228235
    18   -1.301765
    19    0.456471
    20    4.195882
    21    0.597059
    22   -3.064706
    23    5.836471
    24   -1.105294
    25   -2.311765
    26    0.198235
    27    0.297647
    28   -0.252941
    29    2.858824
    30    2.528824
    31    2.844706
    dtype: float64
    Sum of All Differences: 60.26470588235294
    Sum of Squared Differences: 175.26139377162627
    correlation: 0.825700621440239
    
    xxxxxxxxxx
     
    ### New DataFrame
    ​
    Make a new dataframe that displays the difference for each team.
    ​
    The difference used is the prediction - the projection. 
    ​
    Therefore, a negative number suggests that a team is conceding less than predicted. A positive number suggests that a team is conceding more points than predicted. 
    ​
    This model has shown 8 teams out of 32 have deviated from the prediction by more than 3.01 points (3 points for a field goal) and none are 7 or further away.
    ​
    This can be considered a success as the majority of the teams points allowed are less than a field goal.
    ​
    In addition, the situational factors could explain some of the deviance such as a team attempting a 4th down try, late into games, instead of a field goal 
    and getting a Touchdown when earlier in the game would unlikely occur. Similarly turnovers and poor field position are other factors that have not been included in this model which could explain for the deviance.

    New DataFrame¶

    Make a new dataframe that displays the difference for each team.

    The difference used is the prediction - the projection.

    Therefore, a negative number suggests that a team is conceding less than predicted. A positive number suggests that a team is conceding more points than predicted.

    This model has shown 8 teams out of 32 have deviated from the prediction by more than 3.01 points (3 points for a field goal) and none are 7 or further away.

    This can be considered a success as the majority of the teams points allowed are less than a field goal.

    In addition, the situational factors could explain some of the deviance such as a team attempting a 4th down try, late into games, instead of a field goal and getting a Touchdown when earlier in the game would unlikely occur. Similarly turnovers and poor field position are other factors that have not been included in this model which could explain for the deviance.

    [116]:
     
    Teams = ('Arizona Cardinals','Atlanta Falcons','Baltimore Ravens','Buffalo Bills','Carolina Panthers','Chicago Bears','Cincinnati Bengals','Cleveland Browns','Dallas Cowboys','Denver Broncos','Detroit Lions','Green Bay Packers','Houston Texans','Indianapolis Colts','Jacksonville Jaguars','Kansas City Chiefs','Las Vegas Raiders','Los Angeles Chargers','Los Angeles Rams','Miami Dolphins','Minnesota Vikings','New England Patriots','New Orleans Saints','New York Giants','New York Jets','Philadelphia Eagles','Pittsburgh Steelers','San Francisco 49ers','Seattle Seahawks','Tampa Bay Buccaneers','Tennessee Titans','Washington Commanders',         )
    df1 = pd.DataFrame(columns=['Teams'])
    df1['Teams'] = Teams
    diff = preg - pg
    df1['P/G Diff'] = diff
    df1
    [116]:
    Teams P/G Diff
    0 Arizona Cardinals -1.346471
    1 Atlanta Falcons 0.131765
    2 Baltimore Ravens -1.464118
    3 Buffalo Bills 0.903529
    4 Carolina Panthers 1.268235
    5 Chicago Bears 2.062941
    6 Cincinnati Bengals 0.714706
    7 Cleveland Browns -1.087059
    8 Dallas Cowboys 3.148824
    9 Denver Broncos 1.874706
    10 Detroit Lions -4.165882
    11 Green Bay Packers 3.542353
    12 Houston Texans 1.117059
    13 Indianapolis Colts 3.470000
    14 Jacksonville Jaguars 1.446471
    15 Kansas City Chiefs -0.705882
    16 Las Vegas Raiders 0.735882
    17 Los Angeles Chargers -3.228235
    18 Los Angeles Rams -1.301765
    19 Miami Dolphins 0.456471
    20 Minnesota Vikings 4.195882
    21 New England Patriots 0.597059
    22 New Orleans Saints -3.064706
    23 New York Giants 5.836471
    24 New York Jets -1.105294
    25 Philadelphia Eagles -2.311765
    26 Pittsburgh Steelers 0.198235
    27 San Francisco 49ers 0.297647
    28 Seattle Seahawks -0.252941
    29 Tampa Bay Buccaneers 2.858824
    30 Tennessee Titans 2.528824
    31 Washington Commanders 2.844706
    xxxxxxxxxx
     
    ### Bar Chart
    ​
    create a bar chart to show the variation in the prediction compared to the projection for ech team

    Bar Chart¶

    create a bar chart to show the variation in the prediction compared to the projection for ech team

    [113]:
     
    plt.figure(figsize=(10,5))
    plt.bar(df1['Teams'], df1['P/G Diff'],color = ['firebrick','darkred','purple','blue','cyan', 'orange', 'brown','darkorange', 'midnightblue', 'darkorange', 'cornflowerblue', 'forestgreen', 'darkblue', 'blue', 'turquoise', 'red','black','dodgerblue','blue','aqua', 'darkviolet','midnightblue','black', 'blue','darkgreen','lime','black','darkred','darkslategray','darkred','darkturquoise','maroon' ])
    plt.axhline(y = 0, color = 'black', linestyle = '-')
    ​
    plt.xticks(rotation=90)
    plt.show
    [113]:
    <function matplotlib.pyplot.show(close=None, block=None)>
    [41]:
     
    df1.to_csv(r'C:\Users\Rob\Documents\DPredDiff.csv', index = False)
    [ ]:
     
    ​
    [ ]:
     
    ​
    xxxxxxxxxx
    Advanced Tools
    xxxxxxxxxx
    xxxxxxxxxx

    -

    Variables

    Callstack

      Breakpoints

      Source

      xxxxxxxxxx
      1
        0
        7
        Python 3 (ipykernel) | Idle
        Saving completed
        Uploading…
        DAnalysisPred.ipynb
        Spaces: 4
        Ln 4, Col 65
        Mode: Command
        • Console
        • Change Kernel…
        • Clear Console Cells
        • Close and Shut Down…
        • Insert Line Break
        • Interrupt Kernel
        • New Console
        • Restart Kernel…
        • Run Cell (forced)
        • Run Cell (unforced)
        • Show All Kernel Activity
        • Debugger
        • Continue
          Continue
          F9
        • Evaluate Code
          Evaluate Code
        • Next
          Next
          F10
        • Step In
          Step In
          F11
        • Step Out
          Step Out
          Shift+F11
        • Terminate
          Terminate
          Shift+F9
        • Extension Manager
        • Enable Extension Manager
        • File Operations
        • Autosave Documents
        • Download
          Download the file to your computer
        • Open from Path…
          Open from path
        • Reload Notebook from Disk
          Reload contents from disk
        • Revert Notebook to Checkpoint
          Revert contents to previous checkpoint
        • Save Notebook
          Save and create checkpoint
          Ctrl+S
        • Save Notebook As…
          Save with new path
          Ctrl+Shift+S
        • Show Active File in File Browser
        • Trust HTML File
        • Help
        • About JupyterLab
        • Jupyter Forum
        • Jupyter Reference
        • JupyterLab FAQ
        • JupyterLab Reference
        • Launch Classic Notebook
        • Licenses
        • Markdown Reference
        • Reset Application State
        • Image Viewer
        • Flip image horizontally
          H
        • Flip image vertically
          V
        • Invert Colors
          I
        • Reset Image
          0
        • Rotate Clockwise
          ]
        • Rotate Counterclockwise
          [
        • Zoom In
          =
        • Zoom Out
          -
        • Kernel Operations
        • Shut Down All Kernels…
        • Launcher
        • New Launcher
        • Main Area
        • Activate Next Tab
          Ctrl+Shift+]
        • Activate Next Tab Bar
          Ctrl+Shift+.
        • Activate Previous Tab
          Ctrl+Shift+[
        • Activate Previous Tab Bar
          Ctrl+Shift+,
        • Activate Previously Used Tab
          Ctrl+Shift+'
        • Close All Other Tabs
        • Close All Tabs
        • Close Tab
          Alt+W
        • Close Tabs to Right
        • Find Next
          Ctrl+G
        • Find Previous
          Ctrl+Shift+G
        • Find…
          Ctrl+F
        • Log Out
          Log out of JupyterLab
        • Presentation Mode
        • Show Header Above Content
        • Show Left Sidebar
          Ctrl+B
        • Show Log Console
        • Show Right Sidebar
        • Show Status Bar
        • Shut Down
          Shut down JupyterLab
        • Simple Interface
          Ctrl+Shift+D
        • Notebook Cell Operations
        • Change to Code Cell Type
          Y
        • Change to Heading 1
          1
        • Change to Heading 2
          2
        • Change to Heading 3
          3
        • Change to Heading 4
          4
        • Change to Heading 5
          5
        • Change to Heading 6
          6
        • Change to Markdown Cell Type
          M
        • Change to Raw Cell Type
          R
        • Clear Outputs
        • Collapse All Code
        • Collapse All Outputs
        • Collapse Selected Code
        • Collapse Selected Outputs
        • Copy Cells
          C
        • Cut Cells
          X
        • Delete Cells
          D, D
        • Disable Scrolling for Outputs
        • Enable Scrolling for Outputs
        • Expand All Code
        • Expand All Outputs
        • Expand Selected Code
        • Expand Selected Outputs
        • Extend Selection Above
          Shift+K
        • Extend Selection Below
          Shift+J
        • Extend Selection to Bottom
          Shift+End
        • Extend Selection to Top
          Shift+Home
        • Insert Cell Above
          A
        • Insert Cell Below
          B
        • Merge Cell Above
          Ctrl+Backspace
        • Merge Cell Below
          Ctrl+Shift+M
        • Merge Selected Cells
          Shift+M
        • Move Cells Down
        • Move Cells Up
        • Paste Cells Above
        • Paste Cells and Replace
        • Paste Cells Below
          V
        • Redo Cell Operation
          Shift+Z
        • Render Side-by-side
          Shift+R
        • Run Selected Cells
          Shift+Enter
        • Run Selected Cells and Don't Advance
          Ctrl+Enter
        • Run Selected Cells and Insert Below
          Alt+Enter
        • Run Selected Text or Current Line in Console
        • Select Cell Above
          K
        • Select Cell Below
          J
        • Set side-by-side ratio
        • Split Cell
          Ctrl+Shift+-
        • Undo Cell Operation
          Z
        • Notebook Operations
        • Change Kernel…
        • Clear All Outputs
        • Close and Shut Down
        • Collapse All Cells
        • Deselect All Cells
        • Enter Command Mode
          Ctrl+M
        • Enter Edit Mode
          Enter
        • Expand All Headings
        • Interrupt Kernel
        • New Console for Notebook
        • New Notebook
          Create a new notebook
        • Open with Voilà in New Browser Tab
        • Reconnect To Kernel
        • Render All Markdown Cells
        • Render Notebook with Voilà
        • Restart Kernel and Clear All Outputs…
        • Restart Kernel and Run All Cells…
        • Restart Kernel and Run up to Selected Cell…
        • Restart Kernel…
        • Run All Above Selected Cell
        • Run All Cells
        • Run Selected Cell and All Below
        • Save and Export Notebook: Asciidoc
        • Save and Export Notebook: Executable Script
        • Save and Export Notebook: HTML
        • Save and Export Notebook: LaTeX
        • Save and Export Notebook: Markdown
        • Save and Export Notebook: PDF
        • Save and Export Notebook: Pdfviahtml
        • Save and Export Notebook: PDFviaHTML
        • Save and Export Notebook: ReStructured Text
        • Save and Export Notebook: Reveal.js Slides
        • Save and Export Notebook: Webpdf
        • Select All Cells
          Ctrl+A
        • Toggle All Line Numbers
          Shift+L
        • Toggle Collapse Notebook Heading
        • Trust Notebook
        • Settings
        • Advanced Settings Editor
          Ctrl+,
        • Show Contextual Help
        • Show Contextual Help
          Live updating code documentation from the active kernel
          Ctrl+I
        • Terminal
        • Decrease Terminal Font Size
        • Increase Terminal Font Size
        • New Terminal
          Start a new terminal session
        • Refresh Terminal
          Refresh the current terminal session
        • Use Terminal Theme: Dark
          Set the terminal theme
        • Use Terminal Theme: Inherit
          Set the terminal theme
        • Use Terminal Theme: Light
          Set the terminal theme
        • Text Editor
        • Decrease Font Size
        • Increase Font Size
        • Indent with Tab
        • New Markdown File
          Create a new markdown file
        • New Python File
          Create a new Python file
        • New Text File
          Create a new text file
        • Spaces: 1
        • Spaces: 2
        • Spaces: 4
        • Spaces: 8
        • Theme
        • Decrease Code Font Size
        • Decrease Content Font Size
        • Decrease UI Font Size
        • Increase Code Font Size
        • Increase Content Font Size
        • Increase UI Font Size
        • Theme Scrollbars
        • Use Theme: JupyterLab Dark
        • Use Theme: JupyterLab Light